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Preface 



This book contains the papers presented at the First International Conference on 
Computers and Games (CG’98) held at the Electrotechnical Laboratory (ETL), 
in Tsukuba, Japan, on November 11-12, 1998. 

The CG’98 focuses on all aspects of research related to computers and games. 
Relevant topics include, but are not limited to, the current state of game-playing 
programs. The book contains new theoretical developments in game-related re- 
search, general scientific contributions produced by the study of games, social 
aspects of computer games, mathematical games, cognitive research of how hu- 
mans play games, and so on. As this volume shows, CG’98 is an international 
conference, with participants from many different countries who have different 
backgrounds and hence exhibit different views on computers and games. 

The Conference was the first one in a series of conferences on this topic. It 
was a direct follow-up of many successful computer-games-related events held in 
Japan, such as the series of four Game Programming Workshops (GPW’94 to 
GPW’97) and the IJCAI-97 Workshop on Computer Games. 

The technical program consisted of a keynote lecture, titled: Predictions (by 
H.J. van den Herik), and 21 presentations of accepted papers. The conference 
was preceded by an informal Workshop on November 10, 1998. The Program 
Committee (PC) received 35 submissions. Each paper was sent to three referees, 
who were selected on the basis of their expert knowledge. Twelve papers were 
accepted immediately, 12 papers were not accepted, and 11 papers were returned 
to the authors with the request to improve them, and with the statement that 
they would be refereed again. Finally, with the help of many referees (see the 
end of this preface), the PC accepted 21 papers for presentation and publication. 

Originally, we tried to sequence the contributions in some logical order, such 
as: from mathematical games via computer science to cognitive sciences, but 
we failed. Neither did the listing of the contents as mentioned above solve the 
problem of ordering the papers. In a way it is a fortunate coincidence that such 
an order could not be established, since it shows that the topic of computers 
and games has an interdisciplinary nature. Nevertheless, to structure the book 
to some extent we distinguish, somewhat arbitrarily, between four sections: (1) 
Search and Strategies, (2) Learning and Pattern Acquisition, (3) Theory, and 
(4) Go, Tsume Shogi, and Heian Shogi. 



Search and Strategies 

In the proceedings of this conference, the first set of six contributions deals with 
search and strategies. The editors believe that search is an important factor 
when trying to solve simple games or to play complex games. Although this is 
disputable for the game of Go, it certainly is true for chess-like games. Moreover, 
any strategy defined as a combination of straightforward movement, indirect 
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approaches, and prophylaxis, is based on search and knowledge. The nature of 
a game determines which factor is predominant. 

The first paper by Junghanns and Schaeffer, titled: Relevance Cuts: Localizing 
the Search , deals with single-agent search. The authors apply their new pruning 
technique on Sokoban. The idea is to use the influence of a move as a measure 
of relevance. Hence, they distinguish between local (relevant) moves and non- 
local (not relevant) moves, with respect to the sequence of moves leading to the 
current state. The new pruning technique uses the m previous moves to decide 
if a move is relevant in the current context; if not, the move must be cut off. 
The application of the technique on a 90-problem test set using search, limited 
to 20 million nodes, leads to 44 solutions. So, much more research is needed to 
solve all 90 problems. 

The contribution by Bjornsson and Marsland, titled: Multi- Cut Pruning in 
Alpha-Beta Search, examines the benefits of investing additional search effort 
at cut-nodes by expanding other move alternatives as well. Their results when 
applied to the game of chess show a strong correlation between the number of 
promising move alternatives at cut-nodes and an emerging new principal varia- 
tion. This correlation can also be exploited otherwise. Hence, there is still a great 
deal of research to be done on other innovative methods based on investigating 
other move options. 

Breuker, Van den Herik, Uiterwijk, and Allis treat the well-known graph- 
history-interaction (GHI) problem. Their contribution, titled: A Solution to the 
CHI Problem for Best-First Search , introduces the notion of twin nodes, which 
makes it possible to distinguish nodes according to their history. The implemen- 
tation of this idea, called BTA (Base-Twin Algorithm), is performed to proof- 
number search. Experimental results in the field of computer chess confirm the 
claim that the GHI problem has been solved for best-first search. 

Under the heading: Optimal Play against Best Defence: Complexity and 
Heuristics, Frank and Basin investigate the best defence model of an imper- 
fect information game. They prove that finding optimal strategies for such a 
model is NP-complete in the size of the game tree. The introduction of two new 
heuristics, viz. beta-reduction and iterative biasing, appears to work well. The 
general idea is that there is a reduction of non-locality due to the introduction of 
mutual relationship between the various choices at MAX nodes. The heuristics 
are applied to a Bridge problem set and actually outperform the human experts 
who produced the model solutions. 

Gao, Iida, Uiterwijk, and Van den Herik present a generalization of OM 
search, called ( D , d)-OM search. Their paper, titled: A Speculative Strategy, in- 
vestigates whether it is worthwhile to deviate from the objectively best path 
when knowing that the opponent only searches to a depth d, whereas the player 
(e.g., the program) searches to a depth D > d. It is shown that a difference 
in search depth can be exploited by deliberately chosing a suboptimal move in 
order to gain a larger advantage than when playing the optimal move. Some 
experiments in the domain of Othello confirm the effectiveness of the proposed 
strategy. 
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In their paper: An Adversarial Planning Approach to Go, Willmott, Richard- 
son, Bundy, and Levine propose an alternative to the usual procedure of search- 
ing a tree of possible move sequences combined with an evaluation function. They 
model the goals of the players and their strategies for achieving these goals. It 
implies searching the space of possible goal expansions, which is typically much 
smaller than the space of move sequences. They describe how adversarial hier- 
archical task network planning can provide a framework for goal-directed game 
playing. The program Gobi has been successfully tested on two test sets of 
Go problems taken from Yoshinori’s four-volume series. It was observed that 
strengthening Gobi’s defensive knowledge led to an improvement in attacking 
plans, and vice versa. This reflects the fact that the better opponent model is 
more likely to find refutations for poor attacking plans. 



Learning and Pattern Acquisition 

The second set of five contributions deals with learning and pattern acquisition. 
The techniques described are applied to the following games: Shogi, Othello, 
Tsume Go (twice), and Checkers. 

The first paper of this set, by Beal and Smith, attempts to determine whether 
sensible values for Shogi pieces can be obtained in the same manner as for western 
chess pieces. Under the heading: First Results from using Temporal Difference 
Learning in Shogi, the authors arrive at values that perform well in matches 
against programs with handcrafted values. They stress the fact that the Shogi 
piece values were learnt from self-play without any domain-specific knowledge 
being supplied. It is remarkable to note that Shogi experts are traditionally 
reluctant to assign values to the pieces. The authors claim that the method is 
also applicable to learning an appropriate weight for positional evaluation terms 
in Shogi. 

Even more advanced is the topic of learning features to be used in evaluation 
functions. This topic is treated by Buro in his paper From Simple Features 
to Sophisticated Evaluation Functions. He discusses a practical framework for 
the semi-automatic construction of evaluation functions for games. Based on 
a structured evaluation-function representation, a procedure for exploring the 
feature space is presented. So, new features are discovered in a computationally 
feasible way. Convincing experimental results for Othello are given and several 
theoretical issues are discussed. 

In their paper: A Two-Step Model of Pattern Acquisition: Application to 
Tsume- Go, Kojima and Yoshikawa carry out a cognitive study. The first step 
is the pattern acquisition step, which uses only positive examples. The second 
step, the pattern refinement step, uses both positive and negative examples. The 
combination of positive and negative examples leads to precise conditions and 
also to a way of conflict resolution. Three distinct algorithms are introduced for 
the first step, and two for the second one. The domain of application is Tsume- 
Go (life and death problems). The performances of six conditions are compared. 
The best performance is achieved by a condition which gives 31% of the answers 
correctly. This result equals the achievement of a one-dan human player. 
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Sasaki, Sawada, and Yoshimura focus on Tsume-Go problems positioned on 
a 9x9 board which has a unique solution. Under the heading: A Neural Network 
Program of Tsume-Go, they describe a network with 543 neurons dealing with 
Kurosen-Shiroshi problems. The backpropagation method is applied and the 
performance of the network is roughly equivalent to a one-dan human player. 
The authors claim that their neural network can be used as a component of the 
strong Tsume-Go and Go programs. 

Although the Checkers program Chinook (by Schaeffer et al.) has been 
crowned as the champion of man-machine contests, the game has not lost any 
of its attractiveness as a research domain. In their paper: Distributed Decision 
Making in Checkers, Giraldez and Borrajo use the game as a testing ground for 
techniques for distributed decision making and learning by Multi- Agent DEci- 
sion Systems (Mades). They propose a new architecture for knowledge-based 
systems dedicated to Checkers playing. Mades should learn how to combine in- 
dividual decisions, in such a way that it outperforms programs without a priori 
knowledge of the quality of each model. 

Theory 

Theory is an outstanding tool for the verification of ideas. We admit that good 
ideas in the context of computers and games must be implementable, but if 
the implemented ideas contain unexpected errors, they give computers a bad 
reputation. So, the theoretical contributions constitute an important part of 
this book. We arranged five papers under this heading. They deal with solution 
trees, heap games, impartial games, complexity, and thermography. 

Pijls and De Bruin show in their contribution: Game Tree Algorithms and 
Solution Trees, that the concept of solution tree is the basic idea underlying the 
minimax principle. They distinguish between two types of solution trees: max 
trees and min trees. Subsequently, they formulate a cut-off criterion in terms 
of solution trees, which eliminates nodes from the search without affecting the 
result. Moreover, they show that any algorithm actually constructs a superpo- 
sition of a max and a min solution tree. At the end of their paper they discuss 
solution trees in relation to alphabeta, SSS*, and MT-SSS. 

Fraenkel and Zusman analyse an extension of Wythoff’s game and provide 
a polynomial-time strategy. Their contribution titled: A New Heap Game, deals 
with k heaps of tokens (k > 3). It is a two-player game with the following rules: 
a move is either taking a positive number of tokens from at most k — 1 heaps, or 
removing the same positive number of tokens from all the k heaps. The authors 
remark that the Sprague-Grundy function g of a game provides a strategy for the 
sum of several games. They express their interest in computing the (/-function 
for this new heap game, but state that they are unaware of the complexity of 
the problem. 

The contribution: Infinite Cyclic Impartial Games, by Fraenkel and Rahat, 
treats the family of locally path-bounded digraphs, which is a class of infinite 
digraphs. The authors show that it is relatively easy to compute an optimal 
strategy for a combinatorial game on this particular class of graphs. Whenever 
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possible, they achieve a win in a finite number of moves. This is done by proving 
that the Generalised Sprague-Grundy function is unique and has finite values 
on this class of graphs. 

On the Complexity of Tsume-Go is the title of Cra§maru’s contribution. With 
the game of Go as a starting point, the author embarks upon an analysis of 
the concept of alive vs. dead, for which he proposes a mathematical model. 
Tsume-Go problems are investigated and it is shown that this kind of problem 
is NP-complete. 

In Extended Thermography for Multiple Kos in Go, Spight discusses the con- 
cept thermography. Many Go positions give rise to combinatorial games. The 
mean value of the game corresponds to the count, and its temperature to the 
value of the play. Thermography determines the mean value and the temper- 
ature of a combinatorial game. Moreover, thermography has been generalized 
to include positions containing a single ko. Spight extends the notion of ther- 
mography even further, namely to include positions with multiple kos. He also 
introduces a method for pruning redundant branches of the game tree. 

Go, Tsume Shogi, and Heian Shogi 

The last set of five contributions deals with Go, Tsume Shogi, and Heian Shogi. 
All five papers provide relevant information on the games and put them in 
perspective. 

Although the interest in Go research has increased considerably in the last 
decade, the playing strength of Go programs is still mediocre. Among the Go 
researchers, a feeling has emerged that developments in the world of chess also 
may crop up in the world of Go. 

As a first step, Muller contributes to this feeling in his contribution: Computer 
Go: a Research Agenda. The author suggests that the obstacles to progress are 
posed by the current structure of the Go community and are at least as serious 
as the purely technical challenges. He introduces three proposals for large-scale 
Go projects, viz. (1) form teams funded by a large company (such as Deep 
Blue), (2) make public-domain source code available (such as Gnu Chess and 
Crafty), and (3) initiate as many university projects as possible. His main 
concern is to overcome the lack of critical human resources. Having seen the 
enthusiasm of the Go researchers at the CG’98, the editors believe that Go 
research has a bright future. 

In Go, the position evaluation is very important, but also very complex. 
So far, no good evaluation functions have been developed. One of the major 
factors for the evaluation of a position is the strength of a group. Tajima and 
Saneclrika describe a new method for estimating the strength of a group, in 
their paper: Estimating the Possible Omission Number for Groups in Go by 
the Number of n-th Dame. The authors have developed a simple method for 
making a rough estimation. They define a PON (Possible Omission Number) as 
a precise measure for the strength of groups. Using PON, their method calculates 
n-th dame (liberties). Experiments support the claim of the effectiveness of the 
method. 
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The way of using Go terms while playing Go depends on the player’s skill. 
Not every player uses the same notion to indicate a certain board characteristic. 
Yoshikawa, Kojima, and Saito have performed extensive cognitive research in this 
area. They report on their research in the paper: Relations between Skill and the 
Use of Terms - An Analysis of Protocols of the Game of Go. Three experiments 
are described in full detail. Starting with a profound analysis of their results, 
the authors developed a hypothesis, which they call the iceberg model, implying 
that the bulk of knowledge is not known to human players. Since it is crucial 
to make the knowledge of how to evaluate a Go position explicitly available for 
computer programs, protocol analyses and the modelling of thought processes 
remain an important issue for future research. 

Grimbergen provides a very readable overview of Tsume-Shogi programs, 
titled: A Survey of Tsume-Shogi Programs using Variable-Depth Search. Tsume- 
Shogi is the name for mating problems in Japanese chess. He discusses six dif- 
ferent Tsume-Shogi programs. Difficult Tsume-Shogi problems have solution se- 
quences which are longer than 20 plies. Hence, all programs have a variable search 
depth and use hashing techniques. The combination of transposition, domina- 
tion, and simulation leads to strong programs that outperform human experts. 
The best program is able to solve Microcosmos, a Tsume-shogi problem with a 
solution sequence of 1525 plies. 

Finally, in the contribution: Retrograde Analysis of the KGK Endgame in 
Shogi: Rs Implications for Ancient Heian Shogi, Iida, Yoshimura, Morita, and 
Uiterwijk examine the evolutionary changes that have occurred in the game of 
Shogi. They go back to the ancient game of Heian Shogi and investigate the 
game results of the KGK endgame (King and Gold vs. King) on NxN boards. 
Since Heian Shogi is only briefly described in the literature, the authors must 
guess which rules were applicable under which circumstances. The paper focuses 
on a logical interpretation of the change of rules at the time that the 8x8 board 
was replaced by a 9x9 board. Moreover, the authors demonstrate that the 10x10 
board is the largest NxN board on which the KGK endgame is a deterministic 
win (of course, with the exception of trivially drawn cases in which the Gold 
can be captured). Future research will focus on the relation between the given 
analysis of the KGK endgames and the reuse rule of captured pieces in modern 
Shogi. 
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Andreas Junghanns and Jonathan Schaeffer 
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Abstract. Humans can effectively navigate through large search spaces, 
enabling them to solve problems with daunting complexity. This is largely 
due to an ability to successfully distinguish between relevant and irrele- 
vant actions (moves). In this paper we present a new single-agent search 
pruning technique that is based on a move’s influence. The influence 
measure is a crude form of relevance in that it is used to differentiate 
between local (relevant) moves and non-local (not relevant) moves, with 
respect to the sequence of moves leading up to the current state. Our 
pruning technique uses the m previous moves to decide if a move is rel- 
evant in the current context and, if not, to cut it off. This technique 
results in a large reduction in the search effort required to solve Sokoban 
problems. 

Keywords: single-agent search, heuristic search, Sokoban, local search, 

IDA* 

1 Introduction and Motivation 

It is commonly acknowledged that the human’s ability to successfully navigate 
through large search spaces is due to their meta-level reasoning J3] . The relevance 
of different actions when composing a plan is an important notion in that process. 
Each next action is viewed as one logically following in a series of steps to 
accomplish a (sub-)goal. An action judged as irrelevant is not considered. 

When searching small search spaces, the computer’s speed in base-level rea- 
soning can effectively overcome the lack of meta-level reasoning by simply enu- 
merating large portions of the search space. However, it is a trivial matter to 
pose a problem to the computer that is easy for a human to solve (using rea- 
soning) but is exponentially large to solve using standard search algorithms. We 
need to enhance computer algorithms to be able to reason at the meta-level if 
they are to successfully tackle these larger search tasks. In the world of com- 
puter games (two-player search), a number of meta-level reasoning algorithmic 
enhancements are well known, such as null-move searches 0 and futility cut-offs 
IQ- For single-agent search, macro moves 0 are an example. 

In this paper, we introduce relevance cuts. The search is restricted in the 
way it chooses its next action. Only actions that are relevant to previous actions 
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can be performed, with a limited number of exceptions being allowed. The exact 
definition of relevance is domain dependent. 

Consider an artist drawing a picture of a wildlife scene. One way of drawing 
the picture is to draw the bear, then the lake, then the mountains, and finally 
the vegetation. An alternate way is to draw a small part of the bear, then draw 
a part of the mountains, draw a single plant, work on the bear again, another 
plant, maybe a bit of lake, etc. The former corresponds to how a human would 
draw the picture: concentrate on an identifiable component and work on it until 
a desired level of completeness has been achieved. The latter corresponds to 
a typical computer method: the order in which the lines are drawn does not 
matter, as long as the final result is achieved. 

Unfortunately, most search algorithms do not follow the human example. At 
each node in the search, the algorithm will consider all legal moves regardless of 
their relevance to the preceding play. For example, in chess, consider a passed 
“a” pawn and a passed “h” pawn. The human will analyze the sequence of moves 
to, say, push the “a” pawn to queen. The computer will consider dubious (but 
legal) lines such as push the “a” pawn one square, push the “h” pawn one square, 
push the “a” pawn one square, etc. Clearly, considering alternatives like this is 
not cost-effective. 

What is missing in the above examples is a notion of relevance. In the chess 
example, having pushed the “a” pawn and then decided to push the “h” pawn, 
it seems silly to now return to considering the “a” pawn. If it really was nec- 
essary to push the “a” pawn a second time, why weren’t both “a” pawn moves 
considered before switching to the “h” pawn? Usually this switching back and 
forth (or “ping-ponging”) does not make sense but, of course, exceptions can be 
constructed. 

In other well-studied single-agent search domains, such as the A-puzzle and 
Rubik’s Cube, the notion of relevance is not important. In both these problems, 
the geographic space of moves is limited, i.e. all legal moves in one position are 
“close” (or local) to each other. For two-player games, the effect of a move may 
be global in scope and therefore moves almost always influence each other (this is 
most prominent in Othello, and less so in chess). In contrast, a move in the game 
of Go is almost always local. In non-trivial, real-world problems, the geographic 
space might be large, allowing for local and non-local moves. 

This paper introduces relevance cuts and demonstrates their effectiveness in 
the one-player game Sokoban. For Sokoban we use a new influence metric that 
reflects the structure of the maze. A move is considered relevant if it is influencing 
all the previous m moves made. The search is only allowed to make relevant 
moves with respect to previous moves and only a limited number of exceptions 
is permitted. With these restrictions in place, the search is forced to spend its 
effort locally, since random jumps within the search area are discouraged. In the 
meta-reasoning sense, forcing the program to consider local moves is making it 
adopt a pseudo-plan; an exception corresponds to a decision to change plans. 
This results in a decrease of the average branching factor of the search tree. 
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For our Sokoban program Rolling Stone, relevance cuts result in a large re- 
duction of the search space. These reductions are on top of an already highly 
efficienlQ searcher. On a standard set of 90 test problems, relevance cuts allow 
Rolling Stone to increase the number of problems it can solve from 39 to 44. 
Given that the problems increase exponentially in difficulty, this relatively small 
increase in the number of problems solved represents a large increase in search 
efficiency. 



2 Sokoban and Related Work 

Single-agent search (A*) has been extensively studied in the literature. There 
are a plethora of enhancements to the basic algorithm, allowing the application 
developer to customize their implementation. The result is an impressive reduc- 
tion in the search effort required to solve challenging applications (see |TQj for 
a recent example). However, the applications used to illustrate the advances in 
single-agent search efficiency are “easy” in the sense that they have some (or all) 
of the following properties: 

1. effective, inexpensive lower-bounci estimators, 

2. small branching factor in the search tree, and 

3. moderate solution lengths. 

The sliding-tile puzzles are the best known examples of these problems. Prob- 
lem domains such as these also have the important property that given a solvable 
starting state, every move preserves the solvability (although not necessarily the 
optimality) . 

Sokoban is a popular one-player computer game. The game originated in 
Japan, although the original author is unknown. The game’s appeal comes from 
the simplicity of the rules and the intellectual challenge offered by deceptively 
easy problems. 

Figure Q] shows a sample Sokoban problem 0 The playing area consists of 
rooms and passageways, laid out on a rectangular grid of size 20x20 or less. 
Littered throughout the playing area are stones (shown as circular discs) and 
goals (shaded squares). There is a man whose job it is to move each stone to a 
goal square. The man can only push one stone at a time and must push from 
behind the stone. A square can only be occupied by one of a wall, stone or man 
at any time. Getting all the stones to the goal squares can be quite challenging; 
doing this in the minimum number of moves is much more difficult. 

To refer to squares in a Sokoban problem, we use a coordinate notation. The 
horizontal axis is labeled from “A” to “T” , and the vertical axis from “a” to “t” 

1 Of course, “highly efficient” here is meant in terms of a computer program. Humans 
shake their heads in disbelief when they see some of the ridiculous lines of play 
considered in the search. 

2 This is problem 1 of the standard 90-problem suite available at 
http:/ /xsokoban. lcs.mit.edu/xsokoban.html. 
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He-Ge Hd-Hc-Hd Fe-Ff-Fg Fh-Gh-Hh-Ih-Jh-Kh-Lh-Mh-Nh- 
Oh-Ph- Qh-Rh-Rg Fg-Fh- Gh-Hh-Ih- Jh-Kh-Lh-Mh-Nh- Oh- 

Ph- Qh- Qi-Ri Fc-Fd-Fe-Ff-Fg-Fh- Gh-Hh-Ih- Jh-Kh-Lh-Mh- 
Nh- Oh-Ph- Qh-Qg Ge-Fe-Ff-Fg-Fh- Gh-Hh-Ih- Jh-Kh-Lh-Mh- 
Nh- Oh-Ph- Qh-Rh Hd-He- Ge-Fe-Ff-Fg-Fh- Gh-Hh-Ih- Jh-Kh- 
Lh-Mh-Nh- Oh-Ph-Pi- Qi Ch-Dh-Eh-Fh- Gh- Hh-Ih- Jh-Kh-Lh- 
Mh-Nh- Oh-Ph- Qh 



Fig. 1. Sokoban Problem 1 With One Solution 



(assuming the maximum sized 20x20 problems), starting in the upper left corner. 
A move consists of pushing a stone from one square to another. For example, in 
Figure Q] the move Fh-Eh moves the stone on Fh left one square. We use Fh-Eh- 
Dh to indicate a sequence of pushes of the same stone. A move, of course, is only 
legal if there is a valid path by which the man can move behind the stone and 
push it. Thus, although we only indicate stone moves (such as Fh-Eh), implicit 
in this is the man’s moves from its current position to the appropriate square 
to do the push (for Fh-Eh the man would have to move from Li to Gh via the 
squares Lh, Kh , Jh , Ih and Hh). 

Unlike most single-agent search problems studied in the literature, a single 
Sokoban move can change a problem from being solvable to unsolvable. For 
example, in Figure H making the move Fh-Fg creates an unsolvable problem. It 
requires a non-trivial analysis to verify this deadlock. This is a simple example, 
since deadlock configurations can be large and span the entire board. Identifying 
deadlock is critical to prevent a lot of futile searching. 

The standard 90 problems range from easy (such as problem 1 above) to 
difficult (requiring hundreds of stone pushes). A global score file is maintained 
showing who has solved which problems and how efficient their solution is (also 
at http://xsokoban.lcs.mit.edu/xsokoban.html). Thus solving a problem is only 
part of the satisfaction; improving on your solution is equally important. 

Sokoban has been shown to be PSPACE-complete 0 BJ- Dor and Zwick 
show that the game is an instance of a motion planning problem, and compare 
the game to other motion planning problems in the literature !3|- For example, 
Sokoban is similar to Wilfong’s work with movable obstacles, where the man 
is allowed to hold on to the obstacle and move with it, as if they were one 
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object m Sokoban can be compared to the problem of having a robot in a 
warehouse move a number of specified goods from their current location to their 
final destination, subject to the topology of the warehouse and any obstacles 
in the way. When viewed in this context, Sokoban is an excellent example of 
using a game as an experimental test-bed for mainstream research in artificial 
intelligence. 

Sokoban is a difficult problem domain for computers because of the following 
reasons: 

1. it has a complex lower-bound estimator (0(n 3 ), given n goals), 

2. the branching factor is large and variable (potentially over 100), 

3. the solution may be very long (some problems require over 500 moves to 
solve optimally), 

4. the search space complexity is O(10 98 ) for problems restricted to a 20x20 
area only, and 

5. some reachable states are unsolvable (deadlock). 

For sliding-tile puzzles, there are algorithms for generating a non-optimal 
solution. In Sokoban, because of the presence of deadlock, often it is very difficult 
to find any solution. 

Our previous attempts to solve Sokoban problems using standard single- 
agent search techniques are reported in [2J. There, using our program Rolling 
Stone , we compare the different techniques and their usefulness with respect to 
the search efficiency when solving Sokoban problems. IDA* was augmented 
with a sophisticated lower bound estimator, transposition tables, move ordering, 
macro moves and deadlock tables. Even though each of the standard single-agent 
search enhancements we investigated resulted in significant improvements (often 
several orders of magnitude in search-tree size reduction), at the time we were 
able to solve only 20 problems of a 90-problem test suite. 

In 0 we introduced a new search enhancement, pattern searches, a method 
that dynamically finds deadlocks and improved lower bounds. Since a single move 
can introduce a deadlock, before playing a move we perform a pattern search to 
analyze if deadlock will be introduced by that move. The pattern search attempts 
to identify the conditions for a deadlock and, if all the conditions are satisfied, 
saves a pattern of stones that is the minimal board configuration required for the 
deadlock. During the IDA* search, a new position can be matched with these 
patterns to see if it contains a deadlock. As a side benefit, these pattern searches 
can also identify arbitrary increases to the lower bound (e.g. a deadlock increases 
the lower bound to oo). 

The notion of bit (stone) patterns is similar to the Method of Analogies jlj . 
Pattern searches are a conflict-driven top-down proof of correctness, while the 
Method of Analogies is a bottom-up heuristic approximation. 

Pattern searches allow us to now solve 39 of the 90 problems pJE Although 
pattern searches can be enhanced to make them more efficient, we concluded 

3 Note that 0 reports slightly different numbers than this paper, caused by subsequent 
refinements to the pattern searches and bug fixes. 
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that they are inadequate to successfully solve all 90 Sokoban test positions. 
Even with all the enhancements, and the cumulative improvements of several 
orders of magnitude in search efficiency, the search trees are still too deep and 
the effective branching factor too high. Hence, we need to find further ways to 
improve the search efficiency. 



3 Relevance Cuts 

Analyzing the trees built by an IDA* search quickly reveals that the search 
algorithm considers move sequences that no human would ever consider. Even 
completely unrelated moves are tested in every legal combination - all in an 
effort to prove that there is no solution for the current threshold. How can a 
program mimic an “understanding” of relevance? We suggest that a reasonable 
approximation of relevance is influence. If two moves are not influencing each 
other then they are very unlikely to be relevant to each other. If a program 
had a good “sense” of influence, it could assume that in a given position all 
previous moves belong to a (unknown) plan of which a continuation can only 
be a move that is relevant - in our approximation, is influencing whatever was 
played previously. 

Thus, the general idea for relevance cuts is to prevent the program from 
trying all possible move sequences. Moves tried have to be relevant to previously 
executed moves. This can be achieved in different, domain specific, ways. The 
following shows one implementation for the domain of Sokoban. Even though 
the specifics aren’t necessarily applicable to other domains, the basic philosophy 
of the approach is. 



3.1 Influence 

When judging how two squares in a Sokoban maze are influencing each other, 
Euclidean distance is not adequate. Taking the structure of the maze into account 
would lead to a simple geographic distance which is still not proportional with 
influence. For example, consider two squares connected by a tunnel; the squares 
are equally influencing each other, no matter how long the tunnel is. Figure E3 
shows several tunnels of which one consists of the squares Ff and Fg. Prolonging 
the tunnel without changing the general topology of the problem would change 
the geographic distance, but not the influence. 

The following is a list of properties we would like the influence measure to 
reflect: 

Alternatives: The more alternatives that exist on a path between two squares, 
the less they influence each other. That is, squares in the middle of a room 
where stones can go in all 4 directions should decrease influence more than 
squares in a tunnel, where no alternatives exist. 

Goal- Skew: Squares on the optimal path to any goal should have stronger 
influence than squares off the optimal path. 
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Connection: Two neighboring squares connected such that a stone can move 
between them should influence each other more than two squares connected 
such that only the man can move between them. 

Tunnel: In a tunnel, influence remains the same: It does not matter how long 
the tunnel is (one could, for example, collapse a tunnel into one square). 

Our first implementation of relevance cuts used small off-line searches to 
statically precalculate a (20x20)x(20x20) table containing the influence values 
for each square of the maze to every other square in the maze. Between every 
pair of squares, a breadth-first search is used to find the path(s) with the largest 
influence. The algorithm is similar to a shortest-path finding algorithm, except 
that we are using influence here and not geographic distance. The smaller the 
influence number, the more two squares are influencing each other. 

Note that influence is not necessarily symmetric ( dist(a,b ) ^ dist(b,a)). A 
square close to a goal influences squares further away more than it is influenced 
by them. Furthermore, dist(a, a) is not necessarily 0. A square in the middle of 
a room will be less influenced by each of its many neighbors than a square in 
a tunnel. To reflect that, squares in the middle of a room receive a larger bias 
than more restricted squares. 

The exact numbers used in our implementation are the following (with the 
name of the wish-list item following in parenthesis). Each square on the path 
between the start and goal squares adds 2 for each direction (off the path consid- 
ered) a stone can be pushed and 1 for each direction the man can go. Thus, the 
maximum one square can add for alternatives is 4 (alternatives). However, every 
square that is part of an optimal path towards any of the goals from the start 
square will add only half of that amount (goal-skew) . If the connection from the 
previous square on the path to the current squares can be taken by a stone only 
1 is added, else 2 (connection). If the previous square is in a tunnel, 0 is added 
(tunnel), regardless of all other properties. 



3.2 Relevance Cut Rules 

Given the above influence measure, we can now proceed to explain how to use 
that information to cut down on the number of moves considered in each position. 
To do this, we need to define distant moves. Given two moves, ml .from-ml .to 
and m2.from-m2.to, move m2 is distant with respect to move ml if the from 
squares of the moves (ml. from, and m2. from) do not influence each other. 
More precisely, two moves influence each other if 

InfluenceTable[ ml. from. ][ m2. from } < d 

where InfluenceTable is the table of precalculated values and d is a tunable 
threshold. 

Relevance cuts eliminate some moves that are distant from the previous 
moves played, and therefore are considered not relevant to the search. There 
are two ways that a move can be cut off: 
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1. If within the last m moves more than k distant moves were made. This cut 
will discourage arbitrary switches between non-related areas of the maze. 

2. A move that is distant with respect to the previous move, but not distant to 
a move in the past m moves. This will not allow switches back into an area 
previously worked on and abandoned just briefly. 

In our experiments, we set k to 1. This way, the first cut criterion will entail 
the second. The parameters d and m are set according to the following properties 
of the maze. The maximal influence distance, d, is set to half the average influence 
value from all squares to the squares on optimal paths to any goal, but not less 
than 6. The length of history used, m, is set to the average influence value of all 
squares to all other non-dead squares in the maze, but not less than 10. 



3.3 Example 

Figure Qshows an example where humans immediately identify that solving this 
problem involves solving two separate sub-problems. Solving the left and right 
side of the problem is completely independent. An optimal solution needs 82 
moves; Rolling Stone's lower bound estimator returns a value of 70. Standard 
IDA* will need 7 iterations to find a solution (our lower-bound estimator pre- 
serves the odd/even parity of the solution length). In each of the iterations but 
the last, IDA* will try every possible (legal) move combination with moves from 
both sides of the problem. This way IDA* proves for each of the 6 iterations i 
that the problem cannot be solved with 70 + 2*i moves, regardless of the order of 
the considered moves. Clearly, this is unnecessary and inefficient. Solving one of 
the sub-problems requires only 4 iterations, since the lower bound is off by only 
6. Considering this position as two separate problems will result in an enormous 
reduction in the search complexity. 




Fig. 2. Example Maze With Locality 
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Our implementation considers all moves on the left and on the right side as 
distant from each other. This way only a limited number of switches is considered 
during the search. Our parameter settings allow for only one non-local move per 
9-move sequence. For this contrived problem, relevance cuts decrease the number 
of nodes searched from 32,803 nodes to 24,748 nodes while still returning an 
optimal solution (the pattern searches were turned off for simplicity). Although 
this is a significant reduction, it is only a small step towards achieving all the 
possible savings. For example, one of the sub-problems can be solved by itself in 
only 329 nodes! The difference between 329 and 32,803 illustrates why IDA* in 
its current form is inadequate for solving large, non-trivial real-world problems. 
Clearly, more sophisticated methods are needed. 

3.4 Discussion 

Further refinement of the parameters used are certainly possible and necessary 
if the full potential of relevance cuts is to be achieved. Some ideas with regards 
to this issue will be discussed in the future work section. 

The overhead of the relevance cuts is negligible, at least for our current 
implementation. The influence of two moves can be established by a simple table 
lookup. This is in stark contrast to our pattern searches, where the overhead 
dominates the cost of the search for most problems. 

4 Experimental Results 

Rolling Stone has been tested using the 90-problem test set using searches limited 
to 20,000,000 nodes. Our previous best version of Rolling Stone was capable of 
solving 39 of the test problems. With the addition of relevance cuts, the number 
of problems solved has increased to 4E Table CD shows a comparison of Rolling 
Stone with and without relevance cuts for each of the 44 solved problems. 

For each program version in Table [Q the third column gives the number of 
IDA* iterations that the program took to solve the problem. Note that problems 
#9, #21 and #51 are now solved non-optimally, taking at least one iteration 
longer than the program without relevance cuts. This confirms the unsafe nature 
of the relevance cuts. However, since none of the problems solved before is lost 
and 5 more are solved, the gamble paid off. Long ago we abandoned our original 
goal of obtaining optimal solutions to Sokoban problems. The size of the search 
space dictates radical pruning measures if we want to have any chance of solving 
some of the tougher problems. 

Of the 5 new problems solved, #11 is of interest. Without relevance cuts, 
only 17 IDA* iterations could be completed within our pre-set limit of 20,000,000 
nodes. Relevance cuts allow Rolling Stone to search 19 iterations and solve the 

4 Note that we “cheat” with problem #46, as we allow it to go 47,000 nodes beyond 
the 20 million node limit. A bug fix pushed it beyond the 20 million limit and we 
wanted it to count in the statistics. We tested all the unsolved problems without the 
relevance cuts to 50 million nodes and no other problem was solved. 
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problem. Given that the cost of an extra iteration is large (and can typically 
be a factor of about 5,000 per iteration 0), a gain of 2 iterations represents a 
massive improvement. 

The tree size for each program version given in Table 0 is broken into two 
numbers. Top-level nodes refers to that portion of the search tree that IDA* 
is applied to. Total nodes includes the top-level nodes and the pattern search 
nodes. Clearly, for some problems (such as #45) the cost of performing pattern 
searches overwhelms the search effort, whereas in other problems (such as #53) 
they are a small investment. Further details on pattern searches and when they 
are executed can be found in 0. 

The magnitude of the top-level nodes can be misleading; superficially it looks 
like these problems can be “trivially” solved with few nodes. Using standard 
IDA* with our sophisticated lower bound estimator fails to solve any of the 90 
test problems within our limit of 20,000,000 nodes. Consequently, we added a 
plethora of enhancements to the program, including transposition tables, macro 
moves, move ordering and deadlock tables, each of which is capable of reducing 
the search tree size by one or more orders of magnitude 0 ! Thus the small top- 
level node counts reported in the table are the result of extensive improvements 
to the search algorithm. 

Relevance cuts reduce the number of top-level nodes by at least a factor of 4.5. 
Note that since the program not using relevance cuts cannot solve 5 problems, 
this factor may be a gross underestimation of the actual impact. 

With respect to the total search nodes, relevance cuts improve search effi- 
ciency by almost a factor of three. Again, this is a lower bound. In particular, 
problem #11 still requires an enormous amount of search, given that it still has 
2 iterations to go before it can find the solution. 

Comparing node numbers of individual searches is difficult because of many 
volatile factors in the search. For example, a relevance cut might eliminate a 
branch from the search justifiably, but a pattern search there would have uncov- 
ered valuable information that would have been useful for reducing the search 
in other parts of the tree. Problem #80 is one such example: despite the rele- 
vance cuts the node count goes up from 99 to 123 nodes; an important discovery 
was not made and the rest of the search increases. However, the overall trend 
is in favor of the relevance cuts. An excellent example is problem #70: the top 
level node count is cut down to 3,006 nodes and a solution is found. Previously 
579,037 nodes were considered without finding a solution. 

Figures 0 and 0 plot the amount of effort to solve a problem, using the 
numbers from Table 0 sorted by total nodes. An additional data point is given 
with a curve that shows what the program’s performance was with all the stan- 
dard single-agent search techniques implemented, before pattern searches where 
added. Figure 0 shows the impact of the relevance cuts. The exponential growth 
in difficulty with each additional problem solved is dampened, allowing for more 
problems solved with the same number of nodes. Figure 0 is a logarithmic rep- 
resentation of Figure 0 The figure more clearly shows that up to about the 25th 
problem (ordered according to number of nodes needed to solve) there is very 
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little difference in effort required; the relevance cuts do not save significant por- 
tions of the small search trees. However, with larger search trees, the success of 
relevance cuts gets more pronounced. 



5 Conclusions and Future Work 

Relevance cuts provide a crude approximation of human-like problem-solving 
methods by forcing the search to favor local moves over global moves. This 
simple idea provides large reductions in the search tree size, at the expense of 
possibly returning a longer solution. Given the breadth and depth of Sokoban 
search trees, finding optimal solutions is a secondary consideration; finding any 
solution is challenging enough. 
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There are several ideas on how to improve the effectiveness of relevance cuts. 

— Use different distances depending on crowding. If many stones are crowding 
an area, it is likely that the relevant area is larger than it would be with less 
stones blocking each other. 

— The current influence measure can most likely be improved. A thorough 
investigation of all the parameters used could lead to substantial improve- 
ments. 

— There are several parameters used in the relevance cuts. The setting of those 
is already dependent of properties of the maze. These parameters are critical 
for the performance of the cuts and are also largely responsible for increased 
solution lengths. More research on those details is needed to fully exploit the 
possibilities relevance cuts are offering. 

— So far, Rolling Stone is painting locally, but is not yet “object oriented”. 
If a flower and the bear are close, painting both at the same time is very 
likely. Better methods are needed to further understand subgoals, rather 
than localizing by area. 

Although relevance cuts introduce non-optimality, this is not an issue. Once 
humans solve a Sokoban problem, they have two choices: move on to another 
problem (they are satisfied with the result), or try and re-sol ve the same problem 
to get a better solution. Rolling Stone could try something similar. Having solved 
the problem once, if we want a better solution, we can reduce the probability of 
introducing non-optimality in the search by decreasing the aggressiveness of the 
relevance cuts. This will make the searches larger but, on the other hand, the 
last iteration does not have to be searched, since a solution for that threshold 
was already found. 

Relevance cuts are yet another way to significantly prune Sokoban search 
trees. We have no shortage of promising ideas, each of which potentially offers 
another order of magnitude reduction in the search tree size. Although this 
sounds impressive, our experience suggests that each factor of 10 improvement 
seems to only yield another 4 or 5 problems being solved. At this rate, we will 
have do a lot of research if we want to successfully solve all 90 problems! 
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Abstract. The efficiency of the a/3-algorithm as a minimax search pro- 
cedure can be attributed to its effective pruning at so called cut-nodes; 
ideally only one move is examined there to establish the minimax value. 
This paper explores the benefits of investing additional search effort at 
cut-nodes by expanding other move alternatives as well. Our results show 
a strong correlation between the number of promising move alternatives 
at cut-nodes and a new principal variation emerging. Furthermore, a new 
forward pruning method is introduced that uses this additional informa- 
tion to ignore potentially futile subtrees. We also provide experimental 
results with the new pruning method in the domain of chess. 



1 Introduction 

The a/3-algorithm is the most popular method for searching game-trees in such 
adversary board games as chess, checkers and Othello. It is much more efficient 
than a plain brute-force minimax search because it allows a large portion of 
the game-tree to be pruned off, while still backing up the correct game-tree 
value. However, the number of nodes visited by the algorithm still increases 
exponentially with increasing search depth. This obviously limits the scope of 
the search, since game-playing programs must meet external time-constraints: 
often having only a few minutes to make a decision. In general, the quality of 
play improves the further the program looks aheacQ. 

Over the years the a/3-algorithm has been enhanced in various ways and 
more efficient variants have been introduced. For example, the basic algorithm 
explores all continuations to some fixed depth, but in practice it is not used that 
way. Instead various heuristics allow variations in the distance to the search hori- 
zon (often called the search depth or search tree height), so that some move se- 
quences can be explored more deeply than others. “Interesting” continuations are 
expanded beyond the nominal depth, while others are terminated prematurely. 
The latter case is referred to as forward pruning , and involves some additional 
risk of overlooking a good continuation. The rationale behind the approach is 

1 Some artificial games have been constructed where the opposite is true; when backing 
up a minimax value the decision quality actually decreases as we search deeper. This 
phenomenon has been studied thoroughly and is referred to as pathology in game-tree 
search |j]. However, such pathology is not seen in chess or the other games we are 
investigating. 
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that the time saved by pruning non-promising lines is better spent searching 
other lines more deeply, in an attempt to increase the overall decision quality. 

To effectively apply forward-pruning, good criteria are needed to determine 
which subtrees to ignore. In here we show that the number of good move al- 
ternatives a player has at cut-nodes can be used to identify potentially futile 
subtrees. Furthermore, we introduce multi-cut aft-pruning, a new forward prun- 
ing method that makes its pruning decisions based on the number of promising 
moves at cut-nodes. In the minimax sense it is enough to find one refutation 
to an inferior line of play. However, instead of finding one such refutation, our 
method uses shallow searches to identify moves that “look” good. If there are 
several moves available that seem good enough to refute the current line of play, 
multicut-pruning prevents that particular line from being expanded more deeply. 

In the following section we introduce the basic idea behind the new pruning 
method, and then show how the idea is implemented in an actual game-playing 
program. Experimental results follow; first the promise of the new pruning crite- 
rion is established, and second the new pruning method is tested in the domain 
of chess. Finally, before drawing our conclusions we explain how related works 
use complementary ideas. 



2 Multi-cut Idea 

In a traditional a/3-search, if a move returns a value greater or equal to ft there 
is no reason to examine that position further, and the search can return. This is 
often referred to as a /3-cutoff (we are using here Knuth’s 0 nega-max formula- 
tion of the a/3-algorithm, where there is no distinction between a- and /3-cutoffs). 
Intuitively, this means that the player to move has found a way to refute the 
current line of play, so there is no need to find a better refutation. By way of 
explanation, and to introduce our terminology, we are seeking the principal vari- 
ation (pv): the best sequence of moves from the root node (current position in 
the game) to the best of the accessible nodes on the search horizon. We expect ft- 
cutoffs to occur at so called cut-nodes (that is, nodes that are refuted). The root 
node of a game-tree is a pv-node, the first child of a pv-node is also a pv-node, 
while the other children are cut-nodes. All children of a cut-node are all-nodes 
(where every successor must be explored) and vice versa. In a perfectly ordered 
tree only one child of a cut-node is expanded. If a new best move is found at a 
pv-node, the node it leads to also becomes a pv-node. At pv- and all-nodes every 
successor is examined. Most often it is the first child that causes the cutoff, but 
if it fails to do so the sibling nodes are expanded in turn, until either one returns 
a value greater or equal to ft or all the children have been searched. If none of 
the moves causes a cutoff, a cut-node becomes an all-node. 

For a new principal variation to emerge, every expected cut-node on the path 
from a leaf-node to the root must become an all-node. In practice, however, it 
is common that if the first move does not cause a cutoff, one of the alterna- 
tive moves will. Therefore, expected cut-nodes, where many moves have a good 
potential of causing a ft-cutojf, are less likely to become all-nodes, and conse- 
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quently such lines are unlikely to become part of a new principal-variation. This 
observation forms the basis for the new forward pruning scheme we introduce 
here, multi-cut a/3-pruning. Before explaining how it works, let us first define an 
mc-prune (multi-cut prune). 

Definition 1 (mc-prune). When searching node v to depth d + 1 using aft- 
search, and if at least c of the first m children of v return a value greater or 
equal to ft when searched to depth d — r, an mc-prune is said to occur and the 
search can return. 

In multi-cut a/3-search, we try for an mc-prune only at expected cut-nodes (we 
would not expect it to be successful elsewhere). Figure [0 shows the basic idea. 
At node v, before searching vi to a full depth d like a normal a/?-search does, 
the first m successors of v are expanded to a reduced depth of d — r. If c of 
them return a value greater or equal to ft an mc-prune occurs and the search 
returns the value of ft, otherwise the search continues as usual exploring V\ to 
a full depth d. The subtrees below V 2 , to depth ( d — r), represent extra 

search overhead introduced by mc-prune, and would not be expanded by normal 
a/3-search, but the dotted area of the subtree below node v\ shows the savings 
that are possible if the mc-prune is successful. However, if the pruning condition 
is not satisfied, we are left with the overhead but no savings. By searching the 
subtree of v\ to less depth, there is of course some risk of overlooking a tactic 
that would make Vi become a new principal variation. We are willing to take 
that risk, because we expect at least one of the c moves that returns a value 
greater or equal to ft when searched to a reduced depth, will cause a genuine 
/Tcutoff if searched to a full depth. 

3 Multi-cut Implementation 

Figure |3is a C-code version of a null- window search (NWS) routine using multi- 
cut. For clarity we have omitted details about search extensions, transposition 
table lookups, null-move searches, and history heuristic updates that are irrele- 
vant to our discussion. The NWS routine 0 is an integral part of the Princi- 
pal Variation Search algorithm. Multi-cut could equally well be implemented in 
other enhanced aft- variants like NegaScout JZ(. The parameter depth is the re- 
maining length of search for the position, and ft is an upper-bound on the value 
we can achieve. There is no need to pass a as a parameter, because it is always 
equal to ft - 1. On the other hand, the new parameter, node-type, shows the 
expected type of the node we are currently looking at. In a null- window search 
we are dealing only with either cut-nodes (CUT) or all-nodes (ALL). 

As is normal, the routine starts by checking whether the horizon has been 
reached, and if so uses a quiescence search (QS) to return the value of the 
position. Otherwise, we look for useful information about the position in the 
transposition table. This is followed by a null-move search (most chess programs 
use this powerful technique) . Normally a standard null- window aft search would 
follow, if the null-move does not cause a cut-off. Instead we insert here a multicut 
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Fig. 1 . Applying the mc-prune method at node v 



search to see if the roc-prune condition applies. The parameters roc_M, roc_i?, 
and mcJC stand for ro (number of moves to look at), r (search reduction), and 
c (number of cutoffs needed), respectively. Although they are shown here as 
constants, they could be determined more dynamically and be allowed to vary 
during the search. 

We do not check for the roc-prune conditions at every node in the tree. First, 
we only test for them at expected cut-nodes. Second, they are not applied at 
the levels of the search tree close to the horizon thus reducing the time overhead 
involved in this method. Finally, there are some game-dependent restrictions 
that apply. In Figure El these latter restrictions are encapsulated in the function 
TryMultiCutQ. In our experiments in the domain of chess (see later) the prun- 
ing is disabled when the endgame is reached, since there are usually few viable 
move options there and the roc-searches are therefore not likely to be successful. 
Also, the positional understanding of chess programs in the endgame is gener- 
ally poorer than in the earlier phases of the game. Therefore the programs rely 
more heavily on the search to guide them, and any forward pruning scheme is 
therefore more likely to be harmful. Furthermore, the pruning is not done if the 
side to move is in check, or if search extensions have been applied for any of the 
three previous moves leading to the current position. 
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#define mc_M 
#define mc_C 
#define mc_R 



10 //Multi-Cut: # of moves to look at 
3 // # of cuts to cause an mc-prune 

2 // depth reduction 



#def ine CUT 2 

#define ALL 3 

#def ine CHILDTYPE(t) (((t)==CUT) ? ALL : CUT) 



VALUE NWS(int depth, VALUE beta, NODETYPE node_type) 

VALUE score; 

MOVE move ; 



. . . Search extension code omitted . . . 



if ( depth <= 0 ) return QS(beta-l, beta); 

. . . Transposition table lookup and nullmove-search code omitted . . . 
// Multi-Cut pruning 

if ( (node_type == CUT) && (depth > mc_R) && TryMultiCut () ) { 
int m = 0 , c = 0 ; 
move = MoveFirstO; 
while ( m < mc_M && move ) { 

MakeMove ( move ) ; 

score = -NWS(depth-mc_R-l , -beta+1, ALL); 

RetractMove( move ); 
if ( score >= beta ) { 

C++; 

if ( c == mc_C ) return beta; 

> 

m++ ; 

move = MoveNextO; 

> 

} 

// Standard null-window search 
move = MoveFirstO; 
while ( move ) { 

MakeMove ( move ) ; 

score = -NWS(depth-l , -beta+1, CHILDTYPE(node_type) ) ; 

RetractMove( move ); 

if ( score >= beta ) break; 

move = MoveNextO; 

} 

. . . Update transposition/history table code omitted . . . 
return score; 



Fig. 2. Multi-Cut component within Principal Variation Search 
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4 Criteria Selection Process 

The multi-cut idea stands or falls with the hypothesis that nodes having many 
promising move alternatives are more likely to cause a /3-cutoff than those with 
fewer promising move alternatives. In turn, this implies that we have a method 
for determining which moves are promising. In mc-pruning, shallow searches 
are used to identify promising moves. This scheme was chosen based on the 
experiments presented in this section. 

4.1 Identifying False Cut-Nodes 

We will refer to any node where a /3-cutoff is anticipated as an expected cut- 
node. Only after searching the node do we know if it actually causes a cutoff; 
if it does we call it a True cut-node, otherwise a False cut-node. What we seek 
is a scheme that accurately predicts which expected cut-nodes are False. We 
experimented with the following four different ways of anticipating cut nodes: 

1. Number of legal moves ( NM ): 

The most straight forward approach is simply to assume that every move 
has the same potential of causing a /3-cutoff. Therefore, the more children an 
expected cut-node has, the more likely it is to be a true cut-code. Although 
this assumption is not realistic, it can serve as a baseline for comparison. 

2. History heuristic (HH > A): 

A more sensible approach is to distinguish between good and bad moves. 
For example, by using information from the history-heuristic table JS|. Moves 
with a positive history-heuristic value are known to be useful elsewhere in the 
search-tree. This method defines moves with a history-heuristic value greater 
than a constant A as potentially good. One advantage of this scheme is that 
no additional search is required. 

3. Quiescence search (QS () > /3 — <5) : 

Here quiescence search is used to determine which children of a cut-node 
have a potential for causing a cutoff. If the quiescence search returns a value 
greater or equal to (3 — 8 the child is considered promising. The constant 6 , 
called the /3-cutoff margin, can be either positive or negative. Although, this 
scheme may require additional search, it will hopefully give a better estimate 
than the previous schemes. 

4. Null- window search ( NS(d — r) > [3 — 5): 

This scheme is much like the one above, except instead of using quiescence 
search to estimate the merit of the children, a null-window search to a closer 
horizon at distance d — r is used. 

Next, we established how well the number of promising moves, as judged by each 
of the above schemes, correlates to an expected cut-node being a True cut-node 
or not. To do this, the four test functions were implemented in a chess program, 

TheTurB 

2 TheTurk is a chess program developed at University of Alberta by Yngvi Bjornsson 
and Andreas Junghanns. 
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When the program visits an expected cut-node it calculates the number of 
promising move alternatives in the position according to each scheme. Then, 
after searching the node to a full depth to determine if it really is a cut-node, 
the number of promising moves information is logged to a file along with a flag 
indicating whether the node is a True cut-node. 



4.2 Experimental Results 

Fifty middle-game positions, taken from the Botvinnik vs. Tal world champi- 
onship match in 1960, were used in the experiment. The positions were sampled 
every 5th move from move 10 to move 30. Furthermore, for sake of consistency, 
we only used positions from games when Tal was playing the White side and it 
was his turn to move. The program performed a search for each position logging 
relevant information at expected cut-nodes, as described above. 

The resulting data was classified into two groups, one with the True cut- 
nodes, and the other with the False cut-nodes. We gathered statistics about 
100,000 expected cut-nodes, and of these only 2.5% were classified incorrectly 
(i.e. were False cut-nodes). The average number of promising moves, as judged 
by each scheme, is presented in Table [0 The second column shows the average 
for the True cut-node group and the third column the average for the False 
cut-node group. By comparing the averages and the standard deviations (also 
shown in the table) of the two groups we can determine the scheme that can 
best predict False cut-nodes. That is, we are looking for the scheme that has 
the greatest difference between the averages for the two groups, and the lowest 
standard deviation. 



Table 1. Comparison of different schemes for identifying False cut-nodes 



Method 


True cut-nodes 
x a 


False cut-nodes 
x a 


NM 


35.60 


11.74 


24.83 


14.46 


HH > 0 


22.27 


8.87 


16.35 


9.77 


HH > 100 


9.15 


5.72 


7.13 


5.33 


QSQ > (3 


20.48 


15.03 


0.32 


1.44 


QSQ > 13-25 


23.70 


14.08 


1.66 


4.20 


NS (d-2) > P 


20.62 


14.88 


0.17 


0.55 


NS{ d-2) > /3-25 


23.75 


14.00 


1.46 


3.75 



In Table Q] it is interesting to note that even a simplistic scheme like looking 
at the number of legal moves shows a difference in the averages. However, the 
difference is relatively small and the standard deviation is high. The history 
heuristic schemes have lower standard deviation, but unfortunately the averages 
are too similar. This renders them useless. The methods that rely on search, QS () 
and NS (), do much better, especially those where S (the /3-cutoff margin) is set 
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to zerc0. Not only are the averages for the two groups far apart, but the standard 
deviation is also very low. From the data in Table [Dthe two schemes look almost 
equally effective. Therefore, to discriminate between them further, we filtered the 
data for the False cut-nodes looking only at non-zero data-points (that is, we 
only consider data-points where at least one promising move alternative is found 
by either scheme). The result using the filtered data is given in Tabled Now we 
can see more clearly that the null- window (NS) scheme is a better predictor of 
False cut-nodes. Not only does it show on average fewer false promises, but the 
standard deviation is also much lower. This means that it only very infrequently 
shows False cut-nodes as having more than several promising move alternatives. 
Even in the worst case there never were more than 6 moves listed as promising, 
while for the QS () scheme at least one position had 32 false indicators. 



Table 2. Comparison of selected schemes using filtered data 



Method 


False cut-nodes 
x a 


QSQ > p 
NS(d -2)>P 


2.31 3.20 

1.45 0.86 



The above experiments clearly support the hypothesis that there is a way to 
discriminate between nodes that are likely to become true cut-nodes and those 
that are not. As a result we selected the shallow null-window searches as the 
scheme for finding promising moves in the multi-cut a/3-pruning. 



5 Multi-cut in Practice 

Ultimately, we want to show that game-playing programs using the new pruning 
method can achieve increased playing strength. To test the idea in practice, 
multi-cut a/J-pruning was implemented in TheTurk. Two versions of the program 
were matched against each other, one with multi-cut pruning and the other 
without. Three matches, with 80 games each, were played using different time 
controls. To prevent the programs from playing the same game over and over, 
forty well-known opening positions were used as a starting point. The programs 
played each opening once from the white side and once as black. Table 0 shows 
the match results. T stands for the unmodified version of the program and 
T m c(c,m,r) for the version with multi-cut implemented. We experimented with 
the case m = 10, r — 2, and c = 3 (i.e. 10 moves searched with a depth reduction 
of 2 ply and with 3 /3-cutoffs required to achieve the mc-prune condition) . These 
parameter values are somewhat arbitrary, based on experience and a few test 
trials. 

In The Turk, a 5 value of 25 is equivalent to a quarter of a pawn. 



3 



Multi-cut Pruning in Alpha- Beta Search 



23 



Table 3. Summary of 80-game match results 



Tm C ( 3 , 10 , 2 ) versus T 


Time control 


Score 


Winning % 


40 moves in 5 minutes 


41 - 39 


51.3 


40 moves in 15 minutes 


40 - 40 


50.0 


40 moves in 25 minutes 


44 - 36 


55.0 



The multi-cut version shows a slight improvement over the unmodified ver- 
sion. In tournament play this winning percentage would result in about 15 points 
difference in the players’ rating. Since more than 1,000 games are typically 
needed to obtain a standard error of less than 10 rating points FI], we can- 
not claim that the multi-cut version is the stronger, based only on this single set 
of experiments. 

One final insight, the programs gathered statistics about the behavior of the 
multi-cut pruning. The search spends about 25%-30% if its time (in terms of 
nodes visited) in shallow multi-cut searches, and an mc-prune occurs in about 
45%-50% of its attempts. Obviously, the tree expanded using multi-cut pruning 
differs significantly from the tree expanded when it is not used. 



6 Related Work 

The idea of exploring additional moves at cut-nodes is not entirely new. There 
exist at least two other variants of the a/3-algorithm that consider more than 
one move alternative at cut-nodes: one is Singular Extensions |2] and the other 
McAllester’s Alpha- Beta- Conspiracy Search 0. 

The singular extensions algorithm extends “singular” moves more deeply 
than others. A move is defined as singular if its evaluation is higher than all its 
alternative moves by some specified margin, called the singular margin. Moves 
that fail-high, i.e. cause a /3-cutoff, automatically become candidates for being 
singular (the algorithm also checks for singular moves at pv-nodes) . To determine 
if a candidate move that fails-high really is singular, all its sibling moves are 
explored to a reduced depth. The move is declared singular only if the value 
of all the alternatives is significantly (as defined by the singular margin) lower 
than the value of the principal variation. Singular moves are “remembered” and 
extended one additional ply on subsequent iterations. This method improved 
the playing strength of Deep Thought (predecessor of Deep Blue) by about 30 
USCF rating points p. One might think of multi-cut as the complement of 
singular-extensions: instead of extending lines where there is seemingly only one 
good move, it prunes lines where there are many promising (refutation) moves 
available. 

The Alpha-Beta-Conspiracy algorithm is essentially an a/3-search that uses 
conspiracy depth, instead of classical ply depth, to decide how deep to search. 
When determining whether to terminate the search, static evaluations of all 
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siblings of nodes that lie on the current search path are used. However, empirical 
results using this algorithm were not favorable. 

In addition, there are some best-first search methods that use the notion of 
having options. They are not discussed here. 

7 Conclusions 

We feel that our experimental results give a rise to optimism. Although the self- 
play matches do not prove in a statistically significant way that the new method 
is better, they clearly show that a search method expanding a radically different 
tree than the a/3-algorithm seemingly has at least equal playing strength. 

The multi-cut method is still in its infancy. We are experimenting here with 
a preliminary implementation of the idea. There is still much scope for improve- 
ment through further tuning and enhancement, by experimenting with different 
settings for the parameters c, m, and r, for example. Also, the current implemen- 
tation defines the parameters as constants. Instead, we might need to determine 
them more dynamically, and have the possibility to adjust the values as the 
game/searclr progresses. 

Our experiments show the feasibility of the idea and indicate a strong cor- 
relation between the number of promising move alternatives available at an ex- 
pected cut-node, and the node becoming a True cut-node. The multi-cut idea 
as described and implemented here is not the only way of exploiting this cor- 
relation, and is by no means necessarily the best. There is still much room for 
innovative methods to be developed based on looking at other move options. 
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Abstract In a search graph a node’s value may be dependent on the 
path leading to it. Different paths may lead to different values. Hence, 
it is difficult to determine the value of any node unambiguously. The 
problem is known as the graph-history-interaction (GHI) problem. This 
paper provides a solution for best-first search. First, we give a precise 
formulation of the problem. Then, for best-hrst search and for other 
searches, we review earlier proposals to overcome the problem. Next, 
our solution is given in detail. Here we introduce the notion of twin 
nodes, enabling a distinction of nodes according to their history. The 
implementation, called BTA ( Base-Twin Algorithm ) , is performed for pn 
search, a best-first search algorithm. It is generally applicable to other 
best-first search algorithms. Experimental results in the field of computer 
chess confirm the claim that the GHI problem has been solved for best- 
first search. 

Keywords: graph-history interaction (GHI) problem, best-first search, 
base- twin algorithm (BTA). 



1 The GHI Problem 

Search algorithms are used in many domains, ranging from theorem proving 
to computer games. The algorithms are searching in a state space containing 
problem states (positions), often represented as nodes. A move which transforms 
a position into a new position is represented as an edge connecting the two nodes. 

In a search tree, it may happen that identical nodes are encountered at 
different places. If these so-called transpositions are not recognized, the search 
algorithm unnecessarily expands identical subtrees. Therefore, it is profitable to 
recognize transpositions and to ensure that for each set of identical nodes, only 
one subtree is expanded. 
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In computer-chess programs using a depth-first search algorithm, this idea is 
realized by storing the result of a node’s investigation in a transposition table 
(e-g., Q, G2). If an identical node is encountered in the search process, the result 
is retrieved from the transposition table and used without further investigation. 

If a (selective) best-first search algorithm (which stores the whole search tree 
in memory) is used, the search tree is converted into a search graph, by joining 
identical nodes into one node, thereby merging the subtrees. 

These common ways of dealing with transpositions contain an important flaw: 
determining whether nodes are identical is not the same as determining whether 
the search states represented by the nodes are identical. For two reasons, the 
path leading to a node cannot be ignored. First, the history of a node may 
partly determine the legitimacy of a move. For instance, in chess, castling rights 
are not only determined by the position of the pieces on the board, but also 
by the knowledge that in the position under investigation the King and Rook 
have not moved previously. Second, the history of a node may play a role in 
determining the value of a node. For instance, a position may be declared a 
draw by its three-fold repetition or by the so-called fc-move rule m- 

We refer to the first problem as the move-generation problem , and to the sec- 
ond problem as the evaluation problem. The combination of these two problems 
is referred to as the graph-history-interaction (GHI) problem (cf. and jHf). 

The GHI problem is a noteworthy problem not only in chess but in the field of 
game playing in general. Its applicability extends though to all domains where 
the history of states is important. To mention just one example: in job-shop 
scheduling problems the costs of a task may be dependent on the tasks done so 
far, e.g., the cost of preparing a machine for performing some process depends 
on the state left after the previous process. 

A possible solution to the GHI problem is to include in all nodes the status of 
the relevant properties of the history of the node, i.e., the properties which may 
influence either the move generation or the evaluation of the node. A disadvan- 
tage of such a solution is that too many properties may be relevant, resulting in 
the need of storing large amounts of extra information in each node. For chess, 
we can distinguish four relevant properties of the history of a position (the first 
two being relevant for the move-generation problem, and the last two for the 
evaluation problem): 

1. the castling rights (Kingside and Queenside for both players), 

2. the en-passant capturing rights, 

3. the number of moves played without a capture or a pawn move, and 

4. the set of all positions played on the path leading to this node. 

The first two properties can be included in each node, without much overhead. 
The third property can be included in each node, but will reduce the frequency 
of transpositions drastically. The inclusion of the fourth property, necessary to 
determine whether a draw by three- fold repetition has been encountered, would 
require too much overhead. As a result, in most chess programs, the first two 
properties are included in a node, while the last two are not. 
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Depending on which properties are included in a node, the probability of two 
nodes being identical will be reduced. If not all relevant properties are included 
and transpositions are used, it is possible that incorrect conclusions are drawn 
from the transpositions. Campbell mentioned that, contrary to best-first search 
(which he calls selective search), in depth- first search the GHI problem occurs 
relatively infrequently [5]. 

In this paper we give a solution to the GHI problem for best-first search 
with only a few relevant properties included in a node. In Section |2| an example 
of the GHI problem is given. Previous work on the GHI problem is discussed 
in Section 0 In Section [|] the general solution to the GHI problem for best- 
first search is described. A formalized description and the pseudo-code for the 
implementation in proof-number (pn) search is given in Section 0 Section El 
lists experiments with the new algorithm. It is compared to three other pn- 
search variants. The results are presented in SectionQ Finally, SectionIBIprovides 
conclusions. 

2 An Example of the GHI Problem 

Figured shows a pawn endgame position, taken from jHJ , where the GHI problem 
can occur. White (to move) has achieved a winning position. However, we show 
that it is possible to evaluate this position incorrectly as a draw. In this paper 
we assume that a single repetition of positions evaluates as a draw, in contrast 
with the FIDE ruling which stipulates that the same position must occur three 
times. 




Fig.l. A pawn endgame (wtm). 



In Figured the relevant part of the search tree is pictured. In this article we 
follow the notation of 0, i.e. , for all and/or trees (or graphs) white squares 
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represent OR nodes (positions with the first player to move), and black circles 
represent AND nodes (positions with the second player to move). 

After the move sequence 1. <i?b5? <4?e6? 2. <i?a6? <i?d5 3. <i?b5 <4>e6 
the position after move 1 is repeated (node E), and evaluated as a draw. Since 
White does not have any better alternative on the third move, the position after 
2. <i?a6 (node H) should be evaluated as a draw. Backing up this draw leads 
to the incorrect conclusion that node A evaluates as a draw. However, after the 
winning move sequence 1. <S?a5! si?e6 2. <i?a6! the same position (node H) is 
reached, which is (now) evaluated as a win after 2. . . ., <id5 3. <i?b5 <i?e6 4. 

(node G). Backing up this win leads to the correct conclusion that node 
A evaluates as a win. 




Fig-2. The GHI problem in the pawn endgame. 



An example of the general case is given in Figure El It shows an and/or 
search tree with identical positiontQ The values of the leaves (given in italics) 
are seen from the OR player’s point of view. The values given next to the nodes 
are back-up values. We note that the GHI problem can occur in any type of 
and/or tree. However, to keep the example as clear as possible we have chosen 
to show the example for a minimax game tree. 

The terminal nodes E and G are a win for the OR player, and the termi- 
nal nodes C and F are evaluated as a draw because of repetition of positions. 
Propagating the evaluation values of the terminal nodes through the search tree 

1 In games such as chess, a repetition of positions is impossible after only two ply 
(node C in the left subtree of node B and node F in the subtree of node D). Our 
example disregards this characteristic for simplicity’s sake. 
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Fig. 3. A search tree with repetitions. 



results in a win at the root. When making use of transpositions, every node 
should occur only once in the tree. Assume that a parent generates its children 
and that one of its children already exists in the tree. Then a connecting edge 
from the parent to the existing node is made. This transforms the search tree 
into a Directed Cyclic Graph (DCG) (Figure 




Fig. 4. The DCG corresponding with the tree of Figure 3. 

In this DCG it is difficult to determine unambiguously the value of node F 
due to the GHI problem. The value of this node is dependent on the path leading 
to it. Following the path A-B-C-F , child C of node F is a repetition and hence 
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F is evaluated as a draw, but following the path A-B-D-F, child C is not a 
repetition and is not evaluated as a draw. Thus, in the DCG, node F has two 
different values. Hence, in this example it is not possible to determine the value 
of root A, since in the first case it is a draw, and in the second case it is a win, 
due to the values of E and G. 

3 A Review of Previous Work 

Although several authors have mentioned the GHI problem, so far no solution to 
this problem has been described. Only provisional ideas have been given. Below, 
we review the five most important ideaifl 

Palay first identified the GHI problem SSJ. He suggested two “solutions”: (1) 
refrain from using graphs, and (2) recognize when the GHI problem occurs and 
handle accordingly. The first “solution” (apart from not being a real solution, 
it merely ignores the problem) had as a drawback that large portions of the 
graph would be duplicated every time a duplicate node occurred, wasting a 
large amount of time and memory. The second solution worked as follows. When 
the positions suffering from the GHI problem were recognized, the path from 
the repetition node upwards to the ancestor with multiple parents was split into 
separate paths. He did not implement this strategy, since he conjectured that 
such positions only occurred occasionally (the GHI problem occurred in three 
out of 300 test positions). A disadvantage of this solution is that the recognition 
of positions suffering from the GHI problem is not straightforward. 

Another idea for a solution originates from Thompson jBj. While building 
a tactical analyzer, Thompson used a Directed Cyclic Graph (DCG) represen- 
tation. He saw it suffering from the GHI problem D3- He cured the problem 
by taking into account the history of the node to be expanded. The value of 
this node was then, if necessary, corrected for its history. The newly-generated 
children were evaluated by doing a(3 searches, yet neglecting their history. As 
a consequence, the only history errors could occur at the leaves. These errors 
were corrected as soon as such a leaf was expanded, but it could happen that 
the expansion of a node was suppressed due to the error. 

Campbell discussed the GHI problem thoroughly, applying it to depth-first 
search only 0. The key in avoiding most occurrences of the GHI problem ap- 
pears to be iterative deepening. Some problems (called “draw- first” ) can be 
overcomc0. However, other problems, which he called “draw-last” could not be 
solved by his approacl:0. Finally, he remarked that “the GHI problems occur 

2 Berliner and McConnell suggested the use of conditional values as an idea to solve 
the GHI problem 0[. They promised details in a forthcoming paper. 

3 In the draw-first case node F in Figure El is first reached through path A-B-C-F 
(and the value of node F is based on child C being a repetition) and later in the 
search node F is reached through path A-B-D-F and the previous value of node F 
is used. 

4 In the draw-last case node F in Figure 0 is first reached through path A-B D -F 
(and the value of node F is based on child C being no repetition) and later in the 
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much more frequently in selective search programs, and require some solution in 
order to achieve reasonably general performance. Both Palay’s and Thompson’s 
approaches seem to be acceptable.” We conclude that Campbell gave a partial 
solution for depth-first search, and no solution for best-first search. 

Baum and Smith stumbled on the GHI problem, when implementing their 
best-first search algorithm BPIP (Best Play for Imperfect Players) 0 . Baum and 
Smith completely store the DCG in memory and grow it by using “gulps” . In 
each gulp a fraction of the most interesting leaves is expanded. For each parent- 
child edge e a subset S(e) was defined as the intersection of all ancestor nodes 
and all descendant nodes of edge e. A DCG was claimed to be legitimate (i.e., 
no nodes have to be split) if and only if, for all children C with more than one 
parent P, S(epc) is independent of P. Their solution was as follows. Each time 
a new leaf was created three possibilities were distinguished: (1) if the leaf was a 
repetition it was evaluated as a draw, else (2) if a duplicate node existed in the 
graph, these two nodes were merged on the condition that the resultant DCG 
was legitimate, else (3) the node was evaluated normally. After leaf expansion 
it was exhaustively investigated whether every node C with multiple parents 
passed the S(e) test. If not, such a node C was split into several nodes C", C", 
..., with distinct subsets S(epc). Then, the subtrees of the newly-created nodes 
had to be rebuilt and re-evaluated. Baum and Smith gave this idea as a solution 
to the GHI problem without the support of an implementation. Moreover they 
remarked that “Implementation in a low storage algorithm would probably be 
too costly”. We believe that the overhead introduced by our idea, described in 
the next section, is much less than the overhead introduced by the idea of Baum 
and Smith. 

Schijf et al. investigated the problem E3 in the context of proof-number 
search (pn search) |T|. They examined the problem in Directed Acyclic Graphs 
(DAGs) and DCGs separately. They noted that, when the pn-search algorithm 
for trees is used in DAGs, the proof and disproof numbers are not necessar- 
ily correctly computed, and the most-proving node is not always found. Schijf 
proved that the most-proving node always exists in a DAG m ■ Furthermore, he 
formulated an algorithm for DAGs that correctly determines the most-proving 
node. However, this algorithm is only of theoretical importance, since it has an 
unfavourable time-and-memory complexity. Therefore, a practical algorithm was 
developed. Surprisingly, only two minor modifications to the pn-search algorithm 
for trees are needed for a practical algorithm for DAGs. The first modification 
is that instead of updating only one parent, all parents of a node have to be 
updated. The second modification is that when a child is generated, it has to be 
checked whether this node is a transposition (i.e., if it was generated earlier). If 
this is the case, the parent has to be connected to this node that has already 
been generated. Schijf et al. note that this algorithm contains two flaws El- 
First, the proof and disproof numbers do not represent the cardinality in the 
smallest proof and disproof set, but these numbers are upper bounds to the real 



search node F is reached through path A-B-C-F and the previous value of node F 
is used. 
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proof and disproof numbers. Second, the node selected by the function Select- 
MostProvingNode is not always equal to a most-proving node. However, it still 
holds that if the node chosen is proved, the proof number of the root decreases, 
whereas if this node is disproved, the disproof number of the root decreases. In 
either case the proof or disproof number may decrease by more than unity, as 
a result of the transpositions present. This algorithm has been tested on tic- 
tac-toe m- For the problem of applying pn search to a DCG Schijf et al. give 
a time-and-memory-eflicient algorithm, which, however, sometimes inaccurately 
evaluates nodes as a draw by repetition El. They remark that, as a consequence, 
their algorithm is sometimes unable to find the goal, even though it should have 
found it. 



4 BTA: An Enhanced DCG Algorithm 

In this section we describe a new and correct algorithm (denoted BTA: Base- 
Twin Algorithm) for solving the GHI problem for best-first search. The BTA 
algorithm is based on the distinction of two types of node, termed base nodes and 
twin nodes. The purpose of these types is to distinguish between equal positions 
with different history. Although it was known in the DCG algorithm described 
by Schijf et al. El that nodes sometimes may be incorrectly evaluated as a 
draw, their algorithm was unable to note when this occurs. We have devised an 
alternative in which a sufficient set of relevant properties for correct evaluation 
is recorded. We have chosen to include in a node only a small number of relevant 
properties. The reasons for not including all relevant properties are: 

— some properties are only relevant for a small number of nodes, 

— the more properties are included, the lower the frequency of transpositions, 
and 

— some properties require too much overhead and/or take up too much space 
when included in a node. 

The move-generation problem (cf. Section D3 can easily be solved by in- 
cluding the relevant properties (in chess these are the castling rights and the 
en-passant- capturing rights) into each node. Hence, only the evaluation problem 
(cf. Section QI needs to be solved. We have chosen to describe the solution of 
repetition of positions, since repetition of positions occurs in many search prob- 
lems, and the k- move rule is a special rule which seldomly shows up in practice. 
As mentioned before, we assume that a single repetition of positions results in 
a draw. 

We further distinguish between terminal nodes and leaves. A terminal node 
represents a terminal position, i.e., a position where the rules of the game deter- 
mine whether the result is a win, a draw, or a loss. Leaves are nodes which do 
not have children (yet) . Leaves include terminal nodes and nodes which are not 
yet expanded. 
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4.1 Our Representation of a DCG 

Basically the GHI problem occurs because the search tree is transformed into 
a DCG by merging nodes representing the same position, but having a differ- 
ent history. To avoid such an undesired coalescence, we propose an enhanced 
representation of a DCG. In the graph we distinguish two types of node: base 
nodes and twin nodes. After a node is generated, it is looked up in the graph by 
using a pointer-based table. If it does not exist, it is marked as a base node. If 
it exists, it is marked as a twin node , and a pointer to its base node is created. 
Thus, any twin node points to its base node, but a base node does not point to 
any of its twin nodes. Only base nodes can be expanded. The difference with 
the “standard implementation” of a DCG is that if two or more nodes are repre- 
senting the same position (ignoring history) they are not merged into one node. 
However, their subtree is generated only once. In general, a twin node may have 
a value different from its base node, although they represent the same position. 

Figure^exhibits our implementation of the DCG given in Figure^ (assuming 
that the position corresponding with node F is first generated as child of node 
C and only later as child of node D). Nodes in upper-case are base nodes, nodes 
in lower-case are twin nodes. The dashed arrows are pointers from twin nodes to 
base nodes. The problem mentioned in Figure^can now be handled by assigning 
separate values to nodes F and /, and to C and c, depending on the paths leading 
to the corresponding positions. 




Fig. 5. Our DCG with base nodes and twin nodes corresponding with the DCG 
of Figure 4. 



4.2 The BTA Algorithm as Solution 

As stated before, encountering a repetition of positions in node p does not mean 
that the repetition signals a real draw (defined as the inevitability of a repetition 
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of positions under optimal play). To handle the distinction, we introduce the new 
concept of possible-draw. Node p is marked as a possible-draw if the node is a 
repetition of a node P in the search path. (Whether a possible draw also is a 
real draw depends on the history.) Then the depth of node P in the search path 
(termed the possible-draw depth) is stored in node p. 

The BTA algorithm for best-first search consists of three phases. Phase 1 
deals with the selection of a node. Phase 2 evaluates the selected node. Phase 
3 backs up the new information through the search path. The three phases are 
repeatedly executed until the search process is terminated. 



Phase 1: Select the Best Node In phase 1 a node is selected for evaluation^ 
or First, the root is selected (for further selection, see below). Then, for each 
selected node, two cases exist: 

1. if a child of the selected node is marked as a possible-draw, and the remaining 
children are either real draws, or marked as possible- draws, then the selected 
node is marked as a possible-draw and the corresponding possible-draw depth 
is set to the minimum of the possible-draw depths of the children. Subse- 
quently, all possible-draw markings from the children are removed and the 
parent of the selected node is re-selected for investigation; 

2. otherwise, the best child is selected for investigation, ignoring the children 
which are either real draws, or marked as possible- draws. 

Assume that a node at depth d in the search path is marked as a possible-draw 
and the corresponding possible-draw depth is equal to d. This implies that the 
possible-draw marking of this node is based solely on repetitions of positions in 
the subtree of the node and on real draws. Therefore, the node is a real draw by 
repetition, independent of the history of the node. Hence the node is evaluated 
accordingly. 

The selection of a node is repeated until (1) a real draw by repetition has 
been encountered, or (2) (a twin node of) a base node with known game-theoretic 
value has been founqj, or (3) a leaf has been found. 

The selection of a node in the BTA algorithm is illustrated below. In Figure 0 
part of a search graph is depicted. The selection starts at the root (node A). 
Assume the traversal is in a left-to-right order. Then, at a certain point, node c 
is selected, and marked as a possible-draw because it is a repetition of node C at 
depth two in the search path (see Figure |H| the equal sign represents the possible- 
draw marking and the subscript two represents the possible-draw depth). 

After marking node c as a possible-draw, the parent of this node (node D) is 
re-selected and marked as a possible- draw, with the same possible-draw depth as 
node c. Further, the possible-draw marking of node c is removed. After marking 
node D as a possible-draw, its parent C is re-selected. The next best child (not 

5 We assume that the selection of a node proceeds in a top-down fashion. 

6 This is possible, because a base node does not point to its twin nodes. If the game- 
theoretic value of a twin node becomes known, its corresponding base node is eval- 
uated accordingly, but other twin nodes remain unchanged. 
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Fig. 6 . Encountering the first repetition c. 



marked as a possible- draw) E is selected. Continuing this procedure, at a certain 
point child d of node F is selected. The child c of twin node d is found by 
directing the search to the base node D of node d. Node c is (again) marked as 
a possible-draw because it is a repetition of node C at depth two in the search 
path. See Figure □ 




Fig. 7. Encountering the second repetition c. 



After the re-marking of node c as a possible- draw, the parent of this node 
(twin node d) is re-selected and marked as a possible- draw, with the same 
possible-draw depth as node c. Thereafter, the possible-draw marking of node c 
is removed (for the second time). After marking node d as a possible- draw, its 
parent F is re-selected. The next best child (not marked as a possible- draw) e is 
selected. This node is a repetition of node E at depth three in the search path, 
and is now marked as a possible-draw. See Figure 0 



36 



Dennis M. Breuker et al. 




3 



Fig. 8 . Encountering the repetition e. 



After marking node e as a possible-draw, the parent of this node (node F) is 
re-selected. All its children are marked as a possible-draw. Therefore, node F is 
also marked as a possible- draw, with a possible-draw depth of two (the minimum 
of the possible-draw depths of the children). Further, the possible-draw markings 
of all children are removed. See Figure |9] 




Fig. 9. Marking node F as a possible- draw. 



After marking node Fas a possible- draw, the parent of this node (node E) is 
re-selected and marked as a possible- draw, with the same possible-draw depth as 
node F. Subsequently, the possible-draw marking of node F is removed. After 
marking node E as a possible- draw, its parent (node C) is re-selected. However, 
all its children are marked as a possible- draw. Therefore, node C is also marked as 
a possible- draw, with a possible-draw depth of two (the minimum of the possible- 
draw depths of the children). Again, the possible-draw markings of all children 
are removed. See Figure ITU1 



A Solution to the GHI Problem for Best-First Search 



37 




Fig. 10. Marking node C as a possible- draw. 



Now the selection process finishes, since node C is marked as a possible-draw 
and its corresponding possible-draw depth is equal to the depth of the node 
in the search path. This means that all continuations from C lead, in one or 
another way, to repetitions occurring 

in the subtree of node C . Therefore, node C is evaluated as a real draw 
by repetition, independent of the history of the node, but on the basis of its 
potential continuations. 



Phase 2: Evaluate the Best Node In phase 2 the selected node (say P) is 
evaluated, values Three cases are distinguished. 

1. If P is a real draw by repetition, it is evaluated as a draw. The corresponding 
base node (if existing) is also evaluated as a draw. 

2. If P is a twin node and its corresponding base node is a terminal node, P 
becomes a terminal node as well and is evaluated as such. 

3. If P is a leaf, it is expanded, the children are evaluated and P is evaluated 
using the evaluation values of the children. 



Phase 3: Back up the New Information In phase 3 the value of the selected 
node is updated to the rooiQ and all possible-draw markings are removed, more 
lead the is In contrast to the tree algorithm, in the BTA updating process nodes 
marked as a possible-draw may occur. The back-up value of a node is determined 
by using only the evaluation values of children not marked as a possible- draw. 
in Thus, the children marked as a possible-draw are ignored, because in the next 
iteration the search could be mistakenly directed to one of these children, whereas 
this child was a repetition in the current path, not giving any new information. 

7 In a DCG there can exist more than one path from a node to the root. However, 
only the path along which the node was selected is taken into account. Other paths, 
if any, may be updated after other selection processes. 
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After establishing the back-up value of a node, the possible-draw markings of 
the children are removed. 



5 The Pseudo-Code of the Algorithm 

In this section an implementation of the BTA algorithm in pn search Q is 
given. An explanation following the three phases of Section 4 provides details 
on the seven relevant pn-search procedures and functions. For chess, the goal of 
pn search is finding a mate. A loss and a real draw are in this respect equivalent 
(i.e., they are no win). Hence, two types of node with a known game-theoretic 
value exist: proved nodes (win) and disproved nodes (no win possible). A proved 
or disproved node is called a solved node. 

5.1 Phase 1: Select the Most-Proving Node 

Phase 1 of the algorithm deals with the selection of a (best) node for evaluation. 
This node is termed the most-proving node. In Figure mi the main BTA pn- 
search algorithm is shown. The only parameter of the procedure is root, being 
the root of the search tree. The BTA algorithm resembles the tree algorithm 
described in jlj, a difference being that procedure UpdateAncestors is called with 
the parent of the most-proving node as parameter instead of the most-proving 
node itself, since the most-proving node already has been evaluated in procedure 
ExpandNode. 



procedure BTAProof NumberSearch! root ) 

Evaluate! root ) 

SetProof AndDisproof Numbers! root ) 
root . expanded := false 
root . depth : = 0 

while root, proof ^0 and root . disproof ^0 and 
ResourcesAvailable!) do begin 
mostProvingNode := SelectMostProvingNode ! root ) 
ExpandNode! mostProvingNode ) 

UpdateAncestors! mostProvingNode .parent , root ) 
end 

if root. proof =0 then root. value := true 
elseif root . disproof =0 then root. value := false 
else root. value := unknown /* resources exhausted */ 
end /* BTAProofNumberSearch */ 



Fig. 11. The BTA pn-search algorithm for DCGs. 



A Solution to the GHI Problem for Best-First Search 39 

The procedures Evaluate and SetProofAndDisproofNumbers and the function 
ResourcesAvailable are identical to the same procedures and function in the stan- 
dard tree algorithm (see JJJ), and not detailed here. The function SelectMostProv- 
ingNode finds a most-proving node according to certain conditions. The function 
is given in Figure Q The only parameter of the function is node, being the root 
of the (sub)tree where the most-proving node is located. 

function SelectMostProvingNode ( node ) 
if NodeHasBaseNode ( node ) 
then baseNode := BaseNode ( node ) 
else baseNode := node 
/ * 1: Base node has been solved */ 
if baseNode .proof =0 or baseNode . disproof =0 
then return node 
elseif RepetitionC node ) 

then begin /* 2: Repetition of position */ 

MarkAsPossibleDrawC node ) 

ancestorNode := FindEqualAncestorNode ( node ) 
node.pdDepth := ancestorNode . depth 
return SelectMostProvingNode ( node. parent ) 
end elseif not baseNode . expanded then /* 3: Leaf */ 
return node 

else begin /* 4: Internal node; look for child */ 
bestChild := SelectBestChild( node, baseNode, pdPresent ) 
if bestChild=NULL then begin 
if pdPresent then begin 
MarkAsPossibleDrawC node ) 
node.pdDepth := oo 

for i:=l to baseNode .numberOf Children do begin 
if PossibleDrawSet ( baseNode . children [ i ] ) then 
if baseNode . children [ i ] .pdDepth < node.pdDepth then 
node.pdDepth := baseNode . children [ i ] .pdDepth 
UnMarkAsPossibleDrawC baseNode . children [ i ] ) 
end 

if node. depth = node.pdDepth then return node 
else return SelectMostProvingNode ( node. parent ) 
end else begin 

/* All children are solved, so choose any one */ 
baseNode .proof := baseNode . child [ 1 ] .proof 
baseNode .disproof := baseNode . child [ 1 ]. disproof 
return node 
end 

end else begin 

bestChild. depth := node.depth+1 
return SelectMostProvingNode ( bestChild ) 
end 
end 

end /* SelectMostProvingNode */ 



Fig. 12. The function SelectMostProvingNode. 
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The function starts to examine whether the node under investigation (say P ) 
is a twin node. If so, then the investigation proceeds with the associated base 
node. 

If P has been solved (case 1), P is returned, because the graph has to be 
backed up using this new information. 

If P has not been solved, it is examined whether P is a repetition in the 
current path (case 2). If so, it is marked as a possible- draw. Its ancestor trans- 
position node in the current path is looked up, and the pdDepth (possible-draw 
depth) of the node becomes equal to the depth in the search path of the ancestor 
nodey. Since it is not useful to examine a repetition node further, the selection 
of the most-proving node is directed to the parent of P. 

If P has not been solved and is not a repetition in the current path, it is 
examined whether P is a leaf (case 3). If so, P is the most-proving node which 
has to be expanded, and P is returned. 

Otherwise (case 4), the best child is selected by the function SelectBestChild, 
to be discussed later. If no best child was found, it means that every child is 
either solved (proved in case of an AND node, and disproved in case of an OR 
node) or is marked as a possible- draw. If any of the children is marked as a 
possible- draw, P is marked as a possible-draw as well. The pdDepth of the node 
is set to the minimum of the children’s pdDepths and the markings of all children 
are removed, etc. See Section 4. is 

In Figure mi the function SelectBestChild is listed. The function has three 
parameters. The first parameter (node) is the parent from which the best child 
will be selected. The second parameter (baseNode) is the base node of that 
parentQ. Finally, the third parameter (pdPresent, meaning possible draw present) 
indicates whether one of the children is marked as a possible-draw. The parameter 
pdPresent is initialized by the function SelectBestChild. If the node is an OR node, 
a child marked as a possible-draw will not be selected as best child, since it gains 
nothing and the goal (win) cannot be reached. A best child (of an OR node) is a 
child with the lowest proof number. If the node is an AND node, a child marked 
as a possible-draw is a best child, since the player to move in the AND node is 
satisfied with a repetition (thereby making it impossible for the opponent to 
reach the goal). Otherwise, a best child (of an AND node) is a child with the 
lowest disproof number. This best child is returned. If the best child is either 
solved or marked as a possible- draw, NULL is returned. 

5.2 Phase 2: Evaluate the Most-Proving Node 

After the most-proving node has been found, it has to be expanded and evalu- 
ated. Phase 2 of the algorithm performs this task. Figure El provides the proce- 
dure Expand Node. The only parameter is node, being the node to be expanded. 



8 The variable pdDepth will act as an indicator of the lowest level in the tree at which 
there are nodes having repetition nodes in their subtrees. 

9 We note that if the parent is a base node itself, then the base node is equal to the 
parent. 
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function SelectBestChild ( node, baseNode, pdPresent ) 
bestChild := NULL 
bestValue := oo 
pdPresent := false 

if node.type=OR then begin /* OR node */ 
for i := 1 to baseNode .numberOf Children do begin 
if PossibleDrawSet ( baseNode . children [ i ] ) then 
pdPresent := true 

elseif baseNode . children [ i ] .proof < bestValue 
then begin 

bestChild := baseNode . children [ i ] 
bestValue := bestChild. proof 
end 
end 

end else begin /* AND node */ 
for i := 1 to baseNode .numberOf Children do begin 
if PossibleDrawSet ( baseNode . children [ i ] ) then begin 
pdPresent := true 
break 
end 

if baseNode . children [ i ]. disproof < bestValue 
then begin 

bestChild := baseNode . children [ i ] 
bestValue := bestChild. disproof 
end 
end 
end 

return bestChild 
end /* SelectBestChild */ 



Fig. 13. The function SelectBestChild. 



The procedure starts establishing the base node of the node0 If the base 
node is solved (case 1), the node is evaluated accordingly. 

Otherwise, if the node is marked as a possible-draw (case 2) (and since it was 
chosen by function SelectMostProvingNode), it is evaluated as a real draw. 

In case 3 the node has to be expanded. All children are generated, and evalu- 
ated. If a generated child has no corresponding base node, the attribute expanded 
is initialized to false; if it has a corresponding base node, the attribute expanded 
has been initialized before. Then the node itself is initialized by procedure Set- 
ProofAnd Disproof Numbers. 



10 



We note that if the node is a base node itself, then the base node is equal to the 
node. 
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procedure ExpandNode! node ) 
if NodeHasBaseNode ( node ) 
then baseNode := BaseNode! node ) 
else baseNode := node 

if baseNode .proof =0 or baseNode .disproof =0 then begin 
/* 1: base node already solved */ 
node. proof := baseNode .proof 
node . disproof := baseNode . disproof 
end elseif PossibleDrawSet ( node ) then begin 
/* 2: node has become a real draw */ 
node. proof := oo 
node . disproof := 0 
baseNode .proof := oo 
baseNode . disproof := 0 
end else begin 

/* 3: node has to be expanded */ 

GenerateAllChildren! baseNode ) 
for i:=l to baseNode .numberOf Children do begin 
Evaluate! baseNode . children [ i ] ) 

SetProof AndDisproof Numbers! baseNode . children! i ] ) 
if not NodeHasBaseNode! baseNode . children [ i ] ) then 
baseNode . children [ i ]. expanded := false 
end 

SetProof AndDisproof Numbers ! baseNode ) 
baseNode . expanded : = true 
node. proof := baseNode .proof 
node . disproof := baseNode . disproof 
end 

end /* ExpandNode */ 



Fig. 14. The procedure ExpandNode. 



5.3 Phase 3: Back up the New Information 

Phase 3 of the algorithm has as task to back up the evaluation value of the 
most-proving node. The procedure to update the values of the nodes in the path 
is listed in Figure El The procedure has two parameters. The first parameter 
(node) is the node to be updated, while the second parameter (root) is the root 
of the search tree. Depending on the node type, UpdateOrNode (Figure 16) or 
UpdateAndNode (Figure 17) is performed. 

The parameters of UpdateOrNode are node and baseNode. The algorithm 
basically is the same as the OR part of procedure SetProofAndDisproofNumbers. 
It only differs when a child is marked as a possible- draw. In that case, the child is 
discarded so its value is not used when calculating the back-up value of the node. 
Then, the possible-draw marking of the child is removed. If the node appears to 
be disproved (since all children are either disproved or marked as a possible- draw) 
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procedure UpdateAncestors ( node, root ) 
while node ^ nil do begin 
if NodeHasBaseNode ( node ) 
then baseNode := BaseNode ( node ) 
else baseNode := node 

if node.type=OR 

then UpdateOrNode ( node, baseNode ) 
else UpdateAndNodeC node, baseNode ) 

node := node. parent /* parent in current path */ 
end 

if PossibleDrawSet ( root ) then 
UnMarkAsPossibleDrawC root ) 
end /* UpdateAncestors */ 

Fig. 15. The procedure UpdateAncestors. 

procedure UpdateOrNode ( node, baseNode ) 
min : = oo 
sum : = 0 

pdPresent := false 

for i:=l to baseNode .numberOf Children do begin 
if PossibleDrawSet ( baseNode . child [ i ] ) then begin 
pdPresent := true 
proof := oo 
disproof := 0 

UnMarkAsPossibleDrawC baseNode . child [ i ] ) 
end else begin 

proof := baseNode . child [ i ] .proof 
disproof := baseNode . child [ i ]. disproof 
end 

if proof < min then min := proof 
sum := sum + disproof 
end 

if min=oo and pdPresent then 
SetProof AndDisproof Numbers ( node ) 
else begin 
node. proof := min 
node . disproof := sum 
end 

if node. proof =0 or node . disproof =0 
then begin /* node solved */ 
baseNode .proof := node. proof 
baseNode .disproof := node . disproof 
end 

end /* UpdateOrNode */ 



Fig. 16. The procedure UpdateOrNode. 
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and a repetition child exists, the value of the node is calculated by procedure 
SetProofAndDisproofNumbers. Otherwise, the value has been calculated correctly. 
If the node has been solved, its base node is evaluated accordingly. 



procedure UpdateAndNode ( node, baseNode ) 
min := oo 
sum : = 0 

for i:=l to baseNode .numberOf Children do begin 
proof := baseNode . child [ i ] .proof 
disproof := baseNode . child [ i ]. disproof 
sum : = sum + proof 

if disproof < min then min := disproof 
end 

node. proof := min 
node . disproof := sum 
if node. proof =0 or node . disproof =0 
then begin /* node solved */ 
baseNode .proof := node. proof 
baseNode . disproof := node . disproof 
end 

end /* UpdateAndNode */ 



Fig. 17. The procedure UpdateAndNode. 



The two parameters of UpdateAndNode are equal to the parameters of proce- 
dure UpdateOrNode. The procedure differs from the AND part of the procedure 
SetProofAndDisproofNumbers when the node is solved, and hence the value of its 
base node is evaluated accordingljO- 

6 Experimental 

6.1 The Proof-Number Search Engine 

The proof-number search engine has been implemented in a straightforward 
chess program. The only goal of the pn-search algorithm is searching for mate. 
We distinguish between the attacker and the defender. A position is proved if 
the attacker can mate, while draws (stalemate, repetition of positions, and the 
50-move rule) and mates by the defender are defined to be disproved positions 
for the attacker, the tree information thus 

11 We note that it is impossible for a child of an AND node to be marked as a possible- 
draw, since in that case the search for a most-proving node would have been ter- 
minated in an earlier phase, and the parent already would have been marked as a 
possible-draw. 
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6.2 The Test Set 

Since proof-number search operates best when searching for mates in chess |5J, 
we used a set of mating problems fnjEj. Krabbe’s 35 positions are indicated 
by k:r, in which x refers to the diagram number in the source. The diagrams 
are 8, 35, 37, 38, 40, 44, 60, 61, 78, 192, 194, 195, 196, 197, 198, 199, 206, 207, 
208, 209, 210, 211, 212, 214, 215, 216, 217, 218, 219, 220, 261, 284, 317, 333 and 
334. Reinfeld’s 82 positions are indicated by rx, x again referring to the problem 
number in the source, this time running over 1, 4, 5, 6, 9, 12, 14, 27, 35, 49, 50, 
51, 54, 55, 57, 60, 61, 64, 79, 84, 88, 96, 97, 99, 102, 103, 104, 105, 132, 134, 136, 
138, 139, 143, 154, 156, 158, 159, 160, 161, 167, 168, 172, 173, 177, 179, 182, 184, 
186, 188, 191, 197, 201, 203, 211, 212, 215, 217, 218, 219, 222, 225, 241, 244, 246, 
250, 251, 252, 253, 260, 263, 266, 267, 278, 281, 282, 283, 285, 290, 293, 295 and 
298. This results in a test set of 117 positions. 

6.3 The Setting 

Our BTA algorithm, denoted by BTA , is compared with the following three 
pn-search variants: 

1. the standard tree algorithm, denoted by Tree, 

2. a DAG algorithm, developed by Schijf 1x9. denoted by DAG, and 

3. an (incorrect) DCG algorithm, developed by Schijf et al. jIZ|, denoted by 
DCG. 

The results for the DAG and DCG algorithm will be taken from the literature 
pen. In all implementations, the move ordering is identical. All four algorithms 
searched for a maximum of 500,000 nodes per test position. After 500,000 nodes 
the search was terminated and the problem was marked as not solved. Under 
these conditions 10 positions (k8, k40, k78, kl95, k209, k210, k220, r96, rl05, 
r201) turned out to be not solvable by any of the four algorithms. Therefore they 
are not taken into account in the next section. 



7 Results 

To verify our solution we have first tested the position given in Figure nPI . Tree 
finds a solution within 482,306 nodes. DCG, ignoring the history of a position, 
incorrectly states that White cannot win (due to the GHI problem). Our BTA 
does find a solution within 10,694 nodes. This provides evidence that this occur- 
rence of the GHI problem has been correctly handled. BTA shows the benefit 
of being a DCG algorithm, as evidenced by the decrease in number of nodes 
investigated by a factor of roughly 40 as compared to Tree. 

12 We note that for this problem the goal for White was set to promotion to Queen 
(without Black being able to capture it on the next ply) instead of mate. Further, 
the search was restricted to the 5x5 a4-e8 board. This helps to find the solution 
faster, but does not influence the occurrence of the GHI problem. 
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Thereafter, we have performed the experiments with the test set described 
in Section 6.2. The outcomes are summarized in Table [Q. The first column shows 
the four pn-search variants. The number of positions solved by each algorithm 
is given in the second column. Exactly 96 positions were solved by all four 
algorithms. In the third column the total number of nodes evaluated for the 96 
positions are listed. The additional positions solved per algorithm are as follows. 

- For Tree-. k208, k215, r281; 

- for DAG : k208, k215, k216, rl68, rl82, r281; 

- for DCG: k44, k60, k217, k284, rl68, rl82, r252; 

- for BTA: k44, k60, k208, k215, k216, k217, k284, r!68, r!82, r252, r281. 





# of pos. solved Total nodes 
(out of 117) (96 positions) 


Tree 


99 


4,903,374 


DAG 


102 


3,222,234 


DCG 


103 


2,482,829 


BTA 


107 


2,844,024 



Tablel. Comparing four pn-search variants. 



Obviously, Tree investigates the largest number of nodes, the easy explana- 
tion being that this algorithm does not recognize transpositions. Further, DCG 
examines the smallest number of nodes: this algorithm sometimes prematurely 
disproves positions; hence, on the average less nodes have to be examined. How- 
ever, if such a prematurely disproved position does lead to a win and the node 
is important to the principal variation of the tree, the win can be missed, as 
happens in the positions k208, k215, k216 and r281. This is already mentioned 
by Schijf et al. El- 

From Table |T| it further follows that BTA performs best. It solves each po- 
sition which was solved by at least one of the other three algorithms. Further- 
more, the four positions which were incorrectly disproved by DCG were proved 
by BTA. Compared to the tree algorithm, BTA solves eight additional positions 
and uses only 58% of the number of nodes: a clear improvement. The reduction 
in nodes compared to DAG is still 11.7%. The increase in nodes searched relative 
to DCG (12.7%) is already explained by the unreliability of the latter. We feel 
that the advantage of the larger number of solutions found heavily outweighs 
the disadvantage of the increase in nodes searched. We note that the selection 
of the most-proving node in BTA can be costly in positions with many possible 
transpositions. However, in these types of position the reduction in the number 
of nodes searched is even larger than in “normal” positions. 

As a case in point we present Figure El corresponding with Diagram 216 in 
Krabbe El- It is solved by our BTA algorithm (in 247,686 nodes) and by the 
DAG algorithm (in 366,336 nodes) and not by the two other algorithms (within 
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Fig. 18. Mate in 14 (wtm); (J. Kriheli). 



500,000 nodes). Many transpositions (and many repetitions of positions) exist, 
since after 1. 2a5+ <4>b8 White has a so-called zwickmiihle and can position 
the Bishop anywhere along the a7-gl diagonal for free. For instance, after 2. 
J>,a7-t- <4>a8 3. jJ.bG-t- <4>b8 almost the same position with the same player 
to move is reached: the Bishop has moved from d4 to b6. At any time White 
can choose such a manoeuvre. For the chess-playing reader, the solution is 1. 
Ia5+! <4>b8 2. J,a7+ <4>a8 3. J,c5+! <4>b8 4. Hb5+ <4>a8 5. He7! J,f7 
6. Ha5+ <4>b8 7. Aa7+ <4>a8 8. J,d4+! <4>b8 9. Hb5+ <4>a8 10. Id7! 
®Jg5 11. Ia5+ <4>b8 12. J,a7+ <*a8 13. J,b6+ <*b8 14. J,xc7 mate. 



8 Conclusions 

In this article we have given a solution to the GHI problem, resulting in an 
improved DCG algorithm for pn search, denoted BTA (Base-Twin Algorithm). 
It is shown that in a well-known position, in which the GHI problem occurs when 
a naive DCG algorithm is used, our BTA algorithm finds the correct solution. 
The results on a test set of 117 selected positions support our claim. Despite the 
additional overhead to recognize positions suffering from the GHI problem, our 
BTA algorithm is hardly less efficient than other, non-reliable, DCG algorithms, 
and finds more solutions. 

We note that, though our algorithms are confined to pn search, the strategy 
used is generally applicable to any best-first search algorithm. The only impor- 
tant criterion for application is that a DCG is being built according to the best- 
first principle (choose some leaf node, expand that node, evaluate the children, 
and back up the result). We consider the GHI problem in best-first search to be 
solved. The importance of this statement is that with increasing availability of 
computer memory a growing tendency exists to use best-first search algorithms, 
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like SSS* [HU and variants thereof, or best-first fixed-depth algorithms Hlj . 
which no longer suffer from the GHI problem. What remains is solving the GHI 
problem for depth- first search. This will need a different approach, storing addi- 
tional information in transposition tables rather than in the search tree/graph 
in memory. However, Campbell already noted that in depth-first search the fre- 
quency of GHI problems is considerably smaller than in best-first search H . The 
solution of the GHI problem for depth-first search therefore seems to be of minor 
importance for practical use. 
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Abstract. We investigate the best defence model of an imperfect infor- 
mation game. In particular, we prove that finding optimal strategies for 
this model is NP-complete in the size of the game tree. We then intro- 
duce two new heuristics for this problem and show that they outperform 
previous algorithms. We demonstrate the practical use and effectiveness 
of these heuristics by testing them on random game trees and on a hard 
set of problems from the game of Bridge. For the Bridge problem set, 
our heuristics actually outperform the human experts who produced the 
model solutions. 

Keywords: imperfect information, heuristics, complexity, Bridge 



1 Introduction 

We analyse the problem of finding optimal strategies for two player games of the 
form illustrated in Figure 0 In this game, played between MAX (square nodes) 
and MIN (circular nodes), the terminal values indicate the payoff to MAX of 
reaching each leaf node under any of five possible worlds wi, • ■ ■ , W 5 . The state 
of the world that actually holds depends on information that MAX does not 
know, but to which he can attach a probability distribution (for example the 
toss of a coin or the deal of a deck of cards). For a more general game with n 
possible worlds, every leaf node of the tree has n payoffs, each corresponding to 
the utility for MAX of reaching that node in one of the n worlds. 

An example of a real-life game that can be fit to the model of Figure [D is 
Bridge (for a concrete illustration, see ©• Recently, the assumptions used by 
human experts to analyse Bridge problems have been formalised in a best defence 
model [b , which we summarise here: 

A-I MIN has perfect information. 

A-II MIN chooses his strategy after MAX. 

A-III The strategy adopted by MAX is a pure strategy. 

This model is described as ‘best defence’ because it represents the strongest 
possible assumptions about the opponent — that MIN knows the actual state 
of the world (A-I) and chooses his strategy in the knowledge of MAX’s choices 
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Fig. 1 . A game tree with five possible worlds 



(A-II). It is used by human players because modelling the strongest possible 
opponents provides a lower bound on the payoff that can be expected when the 
opponents are less informed. The assumption that MAX chooses a pure strategy 
(A-III) also restricts the set of possible solutions to a finite (though possibly 
very large) set. 

In this paper we review existing algorithms for analysing trees such as that of 
Figured These algorithms are all heuristic in nature. To show that such heuris- 
tics are the best that can realistically be hoped for, we examine the complexity of 
the problem and demonstrate that finding optimal strategies for the best defence 
model is NP-complete in the size of the game tree. This means that arbitrary 
(imperfect information) trees are hard to analyse. This is in contrast to perfect 
information games, where arbitrary trees are easy to analyse (the minimax algo- 
rithm returns results in time linear in the size of the tree), but arbitrary games 
are hard to analyse, at least when their definitions generate trees of exponential 
size. 

To combat such intractability, we introduce two new heuristics, which we call 
beta-reduction and iterative biasing. Both of these tackle the non-local nature 
[6 j of the types of game shown in Figured by introducing dependencies between 
the choices made at MAX nodes. We demonstrate the practical importance of 
our new heuristics theoretically and experimentally, using both simple game 
trees and a large database of problems from the game of Bridge. For the Bridge 
database, we find that a combination of our heuristics actually outperforms the 
human experts that produced the model solutions. In the past, special-purpose 
Bridge programs for identifying complex positions such as squeezes have been 



52 



Ian Frank and David Basin 



developed m, but our heuristics represent the first general tree search algorithm 
capable of consistently performing at and above expert level in actual card play. 

2 A Bridge Example 

To show how problems arising in real-life games can be represented in the form 
of Figure n consider the example tree of Figure 0 This depicts a simple situation 
in a single suit of a game of Bridge. 



AQ 

Initial state: ... N r 



2 





w 2 _L _L 1 2 _L 2 1 



Fig. 2. The optimal MAX strategy for a single-suit Bridge problem represented 
(left) using constraints on the MIN branches, and (right) using possible worlds 
at leaf nodes 



Such single-suit problems are common in the Bridge literature; the task is 
to find the optimal way to play just the cards in one suit, ignoring possible 
influences from other suits (such as ruffing or entry requirements). It is also 
assumed that the opponents do not initiate play in the suit. In our example, a 
single MAX player controls both the North and the South cards (Ace, Queen, 
and 2) against the two defenders East and West (who are assumed to hold the 
King and the remaining nine low cards). 

To produce the tree of possible moves in Figure El we have made the natural 
simplification (which does not affect the analysis of the problem) that East and 
West have at most three distinct options at any point in the game: to play the 
King (‘K’), a low card (‘x’), or a card from a different suit (‘-’). Even then, 
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however, the complete tree for this problem has 76 leaf nodes, so we have also 
made the further simplification of displaying only the MAX branches that form 
part of the optimal strategy. This optimal strategy starts by playing the two 
from the South hand, and then playing the Ace from North hand if West plays 
a King or a card from a different suit. If West plays a low card, however, the 
Queen is played from the North hand (this is an example of a manoeuvre called 
a finesse). After a play by the second opponent (East), the highest of the four 
cards played to date is said to win a trick for either North-South or East- West, 
and the next trick begins. In our example, North-South will then either be able 
to gain a trick by cashing the Ace or the Queen, or will have no further options. 
The total number of North-South tricks is indicated at the leaf nodes of the tree. 

The game tree formed in this way is somewhat nonstandard in that all of the 
branches at MIN nodes can only be followed if certain conditions are true. For 
instance, East can only play the King if East actually holds the King. Applying 
the standard minimax algorithm to the tree without respecting these conditions 
results in a value of 1. However, we have some knowledge about the constraints 
on MIN’s available moves, namely that they are the result of a chance move 
(the deal of the cards). If we do not distinguish between the nine low cards, this 
chance move can result in 20 distinct possibilities (for either East or West, they 
may hold between 0 and 9 low cards, and either hold the King or not). Rather 
than list the payoffs for each of these twenty worlds in the figure, we have again 
simplified the presentation by instead considering the two mutually exclusive 
possibilities of “West holds the King” (wi) and “East holds the King” (w 2 ). On 
the right of our figure, we have included a second game tree with payoffs at the 
leaf nodes for just these two worlds, using the symbol _L to represent branches 
that cannot be followed. This shows that two tricks can always be made when 
West holds the King, but only one trick when East holds the King. 

The second game tree of Figure ED is now in a format similar to that of 
Figured Indeed, the only substantial difference is that some of the payoffs take 
the undefined value _L. In terms of basic game theory jTRIR| . both this tree and 
the tree of Figure d are just compact ways of representing the extensive form 
of a particular kind of two-person, zero-sum game with imperfect information. 
For the example of Figure d the extensive form tree would have a single chance 
move at the root, and n = 5 identically shaped subtrees (the payoff n-tuples 
can be assigned at each leaf node so that the ith component is the payoff for 
that leaf in the itli subtree) . In the Bridge example, there is also a single chance 
move (the deal of the cards), but the moves that are possible in each world are 
different, giving rise to some _L values as payoffs. 

So, game trees like those given in Figures d and El can be used to represent 
interesting and nontrivial games. The rest of this paper will therefore investigate 
the complexity of playing such games optimally, and introduce new algorithms 
for finding optimal strategies. Throughout our analysis of such games, we will 
assume that the only move with an unknown outcome is the chance move that 
starts the game. Thus, a single node in one of our trees represents between 1 
and n information sets: one if the player whose turn it is to move knows the 
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exact outcome of the chance move, and n if the player has no knowledge. Since 
our analysis will make repeated reference to algorithms such as minimaxing, we 
will make one further notational convention. Throughout the paper, the normal 
min and max functions will be extended so that min(_L, _L) = max(_L, _L) =_L, 
and min(x, _L) = min(_L, x) = max(x, _L) = max(_L, x) = x for all x y^_L. 

3 Review of Existing Algorithms 

The play of imperfect information games has been analysed by a number of re- 
searchers p n rc j - Here, we review the well-established technique of Monte-carlo 
sampling, and then describe two algorithms that we have recently demonstrated 
p. are better at finding optimal strategies. This section contains no new results, 
but since the developments in this field are relatively recent, we summarise them 
in some detail. 



3.1 Monte- Carlo Sampling 

One technique for handling imperfect information is Monte-carlo sampling 0. 
This approach has been used in games such as Scrabble (see 0) and Bridge, 
where it was proposed by Levy m and recently implemented by Ginsberg cm. 
In the context of game trees like that of Figure [Q Monte-carlo sampling consists 
of guessing a possible world and then finding a solution to the game tree for this 
complete information sub-problem. This is much easier than solving the original 
game, since if attention is restricted to just one world, the minimax algorithm 
can be used to find the best strategy. By guessing different worlds and repeating 
this process, it is hoped that an action that works well in a large number of 
worlds can be identified. 

To make this description more concrete, let us consider a general MAX node 
with branches Mi, M 2 , • • • in a game with n worlds. Now, let us say that we can 
find the minimax value, e,j, of the node under branch Mi in world w ; with the 
minimax algorithm. We can then construct a scoring function, /, such as: 

n 

f{Mi)= e ij Pr ( W j) , (1) 

3 = 1 
6 ij 7^-L 



where Pr(wj) represents MAX’s assessment of the probability of the actual world 
being wq . Monte-carlo sampling can then be viewed as selecting a move by using 
the minimax algorithm to generate values of the eq-s, and determining the Mj 
for which the value of /(Mj) is greatest. If there is sufficient time, all the eq 
can be generated, but in practice only some ‘representative’ sample of worlds is 
examined. 

As an example, consider how the tree of Figure [3 is analysed by the above 
characterisation of Monte-carlo sampling. If we examine world wq, the minimax 
values below node a are as shown in Figure 0 (these correspond to e\\ and e 2 i 
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Fig. 3. Finding the minimax value of world Wi 



for this tree) . It is easy to check that the minimax value at node b is again 1 if we 
examine any of the remaining worlds, and that the value at node c is 1 in worlds 
W2, W3, and W4, but 0 in world W5. Thus, if we assume equally likely worlds (that 
is, the function Pr returns 1/5 for each world), Monte-carlo sampling using 0 
to make its branch selection will choose the left-hand branch at node a whenever 
world W 5 is included in its sample. 

Unfortunately, this is not the best strategy for this tree. The best return 
that MAX can hope for when making this choice of the left-hand branch at 
node a is a payoff of 1 in just three worlds (under the best defence model, and 
indeed for any reasonable assumptions about a rational opponent). Choosing the 
right-hand branch at node a, however, offers a payoff of 1 in four worlds. 

The reason for this error can be understood by examining the implicit as- 
sumptions made by a Monte-carlo sampling approach. At first sight, the algo- 
rithm appears to model very closely the best defence assumptions given in the 
Introduction. For instance, it identifies pure MAX strategies that make no use of 
probabilities (A-III). Moreover, by repeatedly applying minimaxing it is assumed 
that MIN can respond optimally to MAX’s moves in each individual world, for 
which perfect knowledge of the world state is a prerequisite ( A-I) . 

However, as well as perfect knowledge for MIN, the repeated application of 
minimaxing also makes the assumption that MAX will be able to play optimally 
in each individual world. It is this false assumption that leads to incorrect anal- 
yses such as that in Figure ED At nodes d and e in this example, MAX makes 
different choices in different worlds. In reality, however, MAX can only attach 
probabilities to the state of the world and must make a single choice for all 
worlds at node d and another single choice for all worlds at node e. Combining 
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the minimax values of separate choices results in an over-optimistic analysis of 
node b. This is an example of the problem of strategy fusion formalised recently 
in 0: the false assumption that MAX can play optimally in each world allows 
the results of different moves — or strategies — to be ‘fused’ together. 

3.2 Vector Minimaxing 

In previous work 0 , we have shown how to remove strategy fusion from Monte- 
carlo sampling with a vector minimaxing algorithm. This algorithm ensures that 
at any MAX node a single branch is chosen in all worlds. To do this, it requires a 
function payoff-vector , defined over the leaf nodes of the game tree. For any leaf- 
node, v, payoff-vectorfu) returns an n-element vector K such that K[j] (the j th 
element of K) takes the value of the payoff at v in world w ; (1 < j < n). Vector 
minimaxing then uses these payoff vectors as shown in Figure El to identify a 
strategy for a tree t , where sub(t ) computes the set of t' s immediate subtrees. 



Algorithm vector-mm(t): 

Take the following actions, depending on t. 



Condition 


Result 


t is leaf node 


payojf-vector(t) 


root of t is a MIN node 


min vector-mm{ti) 

ti£sub(t) 


root of t is a MAX node 


max vector-mm(ti) 

ti£sub(t) 



Fig. 4. The vector minimaxing algorithm 



In this algorithm, the normal min and max functions are extended so that 
they are defined over a set of payoff vectors. The max function returns the single 
vector K , for which 

n 

E Pr K-W] (2) 

K{jT^± 

is maximum, resolving equal choices randomly. In this way, vector minimaxing 
commits to just one choice of branch at each MAX node, avoiding strategy fusion 
(the actual strategy selected by the algorithm is just the set of the choices made 
at the MAX nodes). As for the min function, for a node with m branches and 
therefore m payoff vectors K K m to choose between, a player with perfect 
knowledge of the state of the world is modelled as follows: 



min K j = (min Kf 1] , min K; [2] , • • ■ , min Ki M) • 



( 3 ) 
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That is, the min function returns a vector in which the payoff for each possible 
world is the lowest possible. 

As an example of vector minimaxing in practice, Figure 0 shows how the 
algorithm would analyse the tree of Figure 0 using ovals to represent the vectors 
produced at each node. The branches selected by (01 (assuming equally likely 
worlds) have been highlighted in bold. At node d 7 for example, the right-hand 
branch is chosen because its evaluation of 3/5 is higher than the 2/5 of the 
left-hand branch. Eventually, at the root of the tree, the right-hand branch is 
correctly chosen. 




Fig. 5. Vector minimaxing applied to example tree 



3.3 Payoff-Reduction Minimaxing 

Vector minimaxing offers slight accuracy improvements over Monte-carlo sam- 
pling (again, see [BJ). However, both Monte-carlo sampling and vector minimax- 
ing suffer from the problem of non-locality [B|. This problem is illustrated in 
Figure O, which depicts a game tree with just 3 worlds. 

Against best defence, the optimal strategy for MAX is to choose the left-hand 
branch at both node a and node e. This guarantees a payoff of 1 in world wi. 
In the figure, however, we have annotated the tree to show how it is analysed 
by vector minimaxing. The branches in bold show that the algorithm would 
choose the right-hand branch at node e. The vector produced at node b correctly 
indicates that when MAX makes this selection, a MIN player who knows the 
world state will always be able to restrict MAX to a payoff of 0 (by choosing 
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Fig. 6. Example tree with three worlds 



the right-hand branch at node b in world wj and the left-hand branch in worlds 
W2 and W3). Thus, at the root of the tree, both subtrees have the same analysis, 
and vector minimaxing never wins on this tree. Applying Monte-carlo sampling 
to the same tree, in the limiting case where all possible worlds are examined, 
we see that node b has a minimax value of 1 in world wj, so that the left-hand 
branch would be selected at the root of the tree. However, the same selection as 
vector minimaxing will then be made when subsequently playing at node d or 
node e. Thus, neither Monte-carlo sampling nor vector minimaxing choose the 
best strategy against best defence on this tree. The choice made at node e is 
incorrect because the situation in a different ( non-local ) subtree rooted on node 
d makes it impossible to actually achieve some of the payoffs under node e. 

To tackle the problem of non-locality, we have previously described |Ej an 
algorithm called payoff-reduction minimaxing , or prm. This algorithm is shown 
in its simplest form in Figure0 The reduction in the second step of this algorithm 



Algorithm prm(t)\ 

1. Use the standard minimax algorithm to conduct minimaxing of t in every world 
Wfc. For every MIN node in t, record its minimax value in each world, mt- 

2. Examine the payoff vectors at each leaf node of t. Reduce the (non-_L) payoffs Pk in 
each world w k to the minimum of pk and all the mk of the node’s MIN ancestors. 

3. Apply the vector-mm algorithm to the resulting tree. 



Fig. 7. Simple form of the prm algorithm 
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addresses the problem of non-locality by, in effect, parameterising the payoffs at 
each leaf node with information on the results obtainable in other portions of 
the tree. By using minimax values for this reduction, the game-theoretic value 
of the tree in each individual world is left unaltered, since no payoff is reduced 
to the extent that it would offer MIN a better branch selection at any node in 
any world. 

As an example, consider how the prm algorithm would behave on the tree of 
Figure 0 The minimax value of node c is zero in every world, but all the payoffs 
at node / are also zero, so no reduction is possible. At node b , however, the 
minimax values in the three possible worlds are 1, 0, and 0, respectively. Thus, 
all the payoffs in each world at the leaf nodes under d and e are reduced to at 
most these values. This leaves only the two payoffs of 1 in world wj as shown in 
Figure 0 where the strategy selection subsequently made by vector-minimaxing 
has also been highlighted in bold. In this tree, then, the prm algorithm results 
in the correct strategy being chosen. 




Fig. 8. Applying vector-mm after payoff reduction 



4 Complexity Analysis of Optimal Play 

Here we show that, given a game tree, finding optimal strategies under the best 
defence model is an NP-complete problem. Hence, unless P = NP, heuristics are 
required to tackle this problem in practice. As explained in the Introduction, 
this is fundamentally different from perfect information games (such as n-by-n 
checkers) which, although PSPACE-hard in the size of the initial game configu- 
ration, can be solved in linear time in the size of the game tree 19|. In contrast, 
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our results show that for the problems we are considering, even if the complete 
game tree of a game is small enough to be searched by computer, finding optimal 
strategies may be infeasible. 

Note that there have been previous analyses of the complexity of finding 
optimal strategies in imperfect information games cm but these proofs are 
not applicable when considering the best defence model. For example, |2j show 
that it is NP-complete to determine whether or not an n-player game has a 
pure strategy equilibrium point, but the proof uses a reduction (from the 3- 
Partition problem) that cannot be reproduced with trees such as that of Figure^ 
Moreover, the notion of a pure strategy equilibrium point is also not helpful in 
the best defence model, since A-II introduces an asymmetry: MIN’s strategy 
is chosen after MAX has made a decision, and thus a ‘stable’ strategy pair is 
always found by simply finding the optimal response to any MAX selection. 

For our proof, we first formalise the relevant problems in the format of J3j. 

Best Defence: 

Instance: A game tree t over n worlds and a positive integer k < n. 

Question: Is there is a MAX strategy that returns a payoff of 1 in at least k 
worlds under the best defence model? 

Clique: 

Instance: A graph G = (V, E) and a positive integer k < \V\. 

Question: Does G contain a clique of size k or more? 

Note that here and later, we assume that the payoffs are bounded so that 
storage is possible in constant space, and we measure the size of a game tree 
t to be the number of nodes in t plus the number of payoffs listed (which is 
the product of the number of leaf-nodes and the number of worlds) . Given that 
Clique is NP-complete [Jj, we now prove: 

Theorem 1. Best Defence is NP-complete. 

Proof: To see that Best Defence is in NP observe that given a game tree t 
we can guess a MAX strategy, s, ( e.g ., by specifying the branch to be chosen at 
each MAX node in the tree) and correctly determine the optimal payoff in time 
linear in the size of t. This can be done with an algorithm that is very similar 
to vector-mm. The only modification required is at MAX nodes, where, rather 
than using the max operator of (0, the payoff on the branch specified by s is 
returned. |E] have shown that this algorithm correctly computes the payoff of s, 
and the time taken is clearly linear in the size of the tree. 

To show NP-hardness we reduce Clique to Best Defence. Let G = ( V , E) 
and k be given. We translate G to a tree t, with n = |V| worlds, wi, • • • , w„, 
constructed to have a payoff of 1 in at least k worlds iff G has a k clique. The 
root of t is a MIN node. The next layer has n MAX nodes, which we label V \ , 
..., v n , for Vi £ V. At each MAX node u, there is a left and right branch, called 
li and T'i respectively. The payoff at the leaf node under each is 1 in the j th 
world iff i = j or ( Vi,Vj ) £ E. The payoff at the leaf node under each r, is 1 in 
the jth world iff i ^ j. An example of a graph and its translation are given in 
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Fig. 9. Graph (top) and corresponding game tree (bottom) 



Figure 0 Note that the reduction is trivially computable in time polynomial in 
the size of G. 

Let us call a vertex Vi selected if MAX chooses the left branch at Vi in his 
strategy. Suppose G has a k clique. The clique defines a subset V' C V where 
k = \V'\ and for each Vi, Vj € V' , where i ^ j, (vi, Vj ) £ E. It is easy to see that 
the MAX strategy that selects the vertices in V' has a payoff of 1 in each world 
Wj, where v, £ V' . Hence this strategy has a payoff of at least k. 

Conversely, suppose there is a strategy for MAX with a payoff of at least k. 
Let W be the set of worlds in which MAX’s strategy yields a payoff of 1 and let 
V = {uj | Wj £ W}. Observe that V comprises a clique in G of size at least k 
since for each world Wj £ W, every selected vertex Vj £ V, i ^ j , must have a 
payoff of 1 in world w.,, which implies that ( Vi,Vj ) £ E. QED 

In our example, we can select Vi, V 2 , and V 3 and MIN’s best strategy yields 
a 1 for MAX in worlds wi, W2 and W3, and a 0 in the remaining two. 

5 New Heuristics 

We have shown that the problem of finding optimal strategies for the best defence 
model is NP-complete in the size of the game tree. To date, the most consistently 
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accurate heuristic solution has been the prm algorithm, reviewed in ' li.-'il Here, 
we introduce two new heuristics, both with similar motivations to that of prm : 
the reduction of non-locality through the introduction of dependencies between 
the choices at MAX nodes. 



5.1 Beta- Reduction (and Branch Ordering) 

Our first new heuristic takes as its inspiration the well-known procedure of alpha- 
beta pruning. The alpha-beta technique is used to speed up the search of a perfect 
information game tree by maintaining cutoff values that are used to decide, based 
on the search so far, whether a new node can affect the root value of the tree. 
There are always two values: an alpha value that can never decrease, and a beta 
value that can never increase. A simple extension of this technique to game trees 
with multiple payoffs at the leaf nodes is shown in Figure EJ Here, the alpha 
and beta values are n-tuples and the max and min functions are as defined in 
(1 and ©■ The min function is also used to represent the pruning criterion, as 
min(a,/3) = (3 . This is a simple expedient for dealing with the possibility that 
some payoff values may take the undefined value _L. 



Algorithm vm- a0(t,a,( 3): 

Take the following actions, depending on t. 



Condition 


Result 


t is a leaf node 


payoff-vector(t) 


root of t is a 
MIN node 


for each t* £ sub(t) do 

/ 3 <— min(/3, vm-a/3(ti , a, 0) 
if min(o:, 0) = /3 then return a 

end 

return f3 


root of t is a 
MAX node 


for each f, £ sub(t) do 

a <— max(ct, vm-a/3(ti, a, 0 
if min (a, /3) = (3 then return (3 

end 

return a 



Fig. 10. Vector minimaxing with alpha-beta pruning 



For perfect information games, the alpha-beta algorithm represents a more 
efficient technique for computing the same value as standard minimax. With vm- 
af3, however, it is not only efficiency that may be improved, but also accuracy. 
That is, vm-a/3 will not, in general, return the same value as vector-mm. For an 
illustration of this, let us look again at the tree of Figure Q introduced in 90 
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This branch beta pruned (payoffs not examined) 



Fig. 11. A beta-pruning carried out by the vm-a/3 algorithm on a simple tree 



Figure l*m shows how this tree is analysed by vm-a[3. When node d is examined, 
it produces an alpha value of (1,0,0), which then becomes the beta of node b. 
This beta value is then passed down to node b's next daughter. At node e, the 
first daughter is a leaf node and the alpha value of node e is therefore set to the 
leaf node values (1,0,0). Now, this alpha value is at least as good as the beta 
value of b (that is, min(a,/3) is now equal to /3), so the remaining branches at 
node e can be pruned (beta pruning). Thus, vm-a/3 selects the correct strategy 
on this tree, whereas we saw in 953 that vector-mm is sub-optimal. 

The explanation for vm-af3’s superiority here is that the beta-pruning at node 
e in effect tackles the non-local nature of this game tree by preventing the second 
(sub-optimal) branch at node e from being examined. This is a simple example 
of a general effect. Non-locality occurs when choices are made at internal MAX 
nodes without reference to other subtrees. Since pruning decreases the number 
of MAX nodes that are actually examined, it also decreases the chance that 
non-local effects will lead to errors. 

Although vm-a.fi represents an improvement over vector-mm on this particu- 
lar example, it is not hard to produce a modified tree for which both algorithms 
find the same, incorrect solution. For instance, the small change of increasing 
by one the payoff under node d in world wq leaves the optimal strategy and its 
payoff unchanged. However, vm-a/3 (and also vector-mm ) will not be able to find 
this strategy. Even for this modified tree, though, there is an adaptation of the 
alpha-beta technique that does improve accuracy. To see this, we simply need 
to realise that, when using vm-a/3, the branch selections made during a search 
are constantly reflected in the alpha and beta values passed to any node. These 
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values therefore offer a natural way of tackling non-locality; by ensuring that 
payoffs rendered unachievable by branch selections in the analysed portion of 
the tree do not adversely affect the selections in the remainder of the search. 

In particular, any beta value, ( 3 , generated at a MIN node, v , can be used 
to reduce non-locality at MAX nodes in any subtree of v that has yet to be 
examined. Since MIN chooses the best play in each individual world, each value 
( 3 \j\ imposes a limit (that cannot increase) on the value of the optimal payoff 
that can now be obtained in world Wj. Thus, when making a new selection at 
a MAX node in a subtree of v, all the payoffs of each K, in world w ; should 
be reduced to at most ( 3 [j] . A simple way to implement this observation is to 
modify the result returned at leaf nodes in the algorithm of Figure El to the 
following: 

mm(payoff-vectorft), f 3 ) , ( 4 ) 

where min is again as defined in 0 ). Let us call the algorithm produced by this 
modification vm-beta , and the reductions of leaf node payoffs made by (0 beta- 
reductions. This new algorithm can correctly solve the tree of Figure EH even 
when the payoff in world Wi at node d is greater than one, since the payoffs of 
1 in worlds W2 and W3 are beta-reduced to zero. 

Since beta-reduction only utilises information about branches already se- 
lected, it is sensitive to branch ordering. For example, consider the effect of 
swapping the order of the branches at node b in Figure II II The choice between 
the vectors (1,0,0) and (0,1,1) at node e would then have to be made before 
realising that the payoffs of 1 in W2 and W3 could not be achieved. Thus no beta- 
reductions (or, of course, beta primings) would be possible, and the optimal 
strategy for this reordered tree would not be found. However, note that vm-beta 
is still correct (unlike vm-afi) if we simply swap the two branches at node e. 

Of course, standard (perfect information) alpha beta pruning is also affected 
by branch ordering, at least in terms of efficiency. It is well known that the 
optimal branch ordering for the algorithm is for MAX’s better moves to come 
first at MAX nodes, and for MIN’s better moves to come first at MIN nodes. 
In fact, the same holds for vm-beta , but in terms of accuracy it is the ordering 
at MIN nodes that is most important; the earlier in the search that the payoff 
vectors with relatively small values are encountered, the more likely that beta- 
reductions will become possible. 

Note that the prm algorithm reviewed in Li .Ml can find optimal solutions for 
the tree of Figure El irrespective of branch ordering. To show that other trees 
exist for which vm-beta actually out-performs prm, then, consider the example of 
Figure El Here, the single MIN node has a minimax value of one in every world. 
Thus, prm cannot reduce any leaf node payoffs, and will therefore produce the 
same strategy (shown in the figure) as vector minimaxing. It is easy to see that 
this strategy has a payoff of zero in every world. Employing beta-reduction, on 
the other hand, will guarantee a payoff of 1 in either world W2 or W3 (depending 
on the random choice made at the second MAX node). That prm and vm-beta 
can find optimal strategies on different trees suggests the creation of a hybrid 
prm-beta algorithm. This is easily done by simply replacing vector-mm with vm- 
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w, 1 0 0 10 1 

w 2 0 10 11 0 

w 3 0 1 1 0 0 1 



Fig. 12 . An incorrect strategy selected by vector-mm (and by prm) on a tree 
correctly analysed by vm-beta 



beta in step three of the prm algorithm of Figure 0 The insensitivity of the 
payoff reduction technique to branch ordering allows this hybrid algorithm to 
benefit fully from an ordering that favours beta-reduction. 



5.2 Iterative Biasing 

In our complexity proof of {JH we showed that when a MAX strategy is fixed, its 
payoff can be found in time linear in the size of the game tree. This suggests a 
heuristic for finding good strategies: simply guess a strategy (or even a partial 
strategy) and then check the strategy’s payoff. This guessing can be repeated 
until an answer is demanded, at which point the guess with the best evaluation 
can be returned. However, given our extensive form game tree, we know (see, e.g., 
0) that when the nodes have at least binary branching the number of strategies 
for MAX is exponential in the size of tree (doubly exponential in its depth). 
Finding good guesses among such a large number of possibilities is unlikely to 
be a practical proposition in general. 

However, there is something else besides strategies that can be guessed: pay- 
offs. In fact, given an optimal payoff vector, K max , we can efficiently find an 
optimal strategy for MAX. To see this, consider a game tree with payoff vec- 
tors K at the leaf nodes. Assume it is known that the optimal payoff vector for 
this game is K max . We then compute an optimal strategy (which may not be 
unique, as there could be more than one optimal payoff vector and each such 
vector could also result from more than one strategy) with the following steps, 
which run in time linear in the size of the game tree: 
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1. Compare the payoff vector K at each leaf with K max . Replace the vector 
with the integer 1 if K is at least as good for MAX as K max (that is, if 
min (K,K max ) = K max ), and 0 otherwise. 

2. The optimal strategy is the one returned for MAX by applying standard 
minimax to the resulting tree. 

That this procedure is correct can be shown by first observing that the mini- 
max step must find a strategy with a payoff of one. If this was not the case there 
would be no strategy in the original tree returning a payoff that is at least as 
good as K max , contradicting that K max is an optimal payoff. Now observe that 
a payoff of one means that the strategy yields a payoff at least as good as K max 
on the original tree. QED. 

If we are playing a game where the leaf payoffs come from a finite ro-element 
domain (e.g., natural numbers between 0 and m — 1), the space of possible payoff 
vectors has size m n . Like the total number of strategies, this is exponential, 
but now the exponent is different: it is the number of worlds n. Thus, whereas 
guessing strategies may not be practical, guessing payoff vectors may be more 
feasible. In single-suit Bridge problems, for example, redundancies in the domain 
often reduce the number of significant worlds to a manageable number (such as 
the twenty worlds of the problem in Figure El produced by treating the low cards 
as indistinguishable) . 

We suggest the basic approach of guessing a single element of the optimal 
payoff vector to be some value v (i.e., guessing that K max [k\ = v for a partic- 
ular world Wfc). This guess can then be passed to a modified version of vector 
minimaxing that uses it to bias the search. This biasing is achieved by defining a 
new function, max„ l)t , to replace the definition of CD in the vector-mm algorithm. 
The max„ |t function returns from amongst a set of vectors the one that is best 
according to the relation > defined below. 

v,k 

Definition 1 (Biasing relation). For any two payoff vectors, K i and K 2, we 
say that K 1 > K 2 if and only if either of the following hold: 

v,k 

— the vector K 1 offers a payoff of at least v in world but the vector K 2 
does not , or 

— if neither of K 1 or K 2 offers a payoff of at least v in Wfc, or if both K 1 
and K 2 offer a payoff of at least v in w^, then K 1 must be superior to K 2 
based on an expected value computation on the remaining worlds. That is, 

n n 

E * 1 [*] Pr(wj) > E * 2 [i] Pr(wj) . 

This definition is designed to bias a search so that, wherever possible, a 
branch with a payoff greater than or equal to v in world w ^ is selected. Given 
some finite set, S , of guesses for the pair of values {v, k}, we can then repeat the 
search with different biases — a technique we call iterative biasing. Specifically, 
we can create the iterative vector minimaxing (or ivm) algorithm of Figure El 
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Algorithm ivm(t,S) 

Given S = {{m, fci}, {v 2 , fc 2 }, ■ ■ •}, 
for each {vj, kj} € S 

compute Sj = biased-vm(t,Vj,kj) 

end 

return the Sj that represents the best expected payoff 

Here, biased-vm(t, v, k) takes the following actions, depending on t. 



Condition 


Result 


t is leaf node 


payoff-vector(t) 


root of t is a MIN node 


min biased-vm(ti,v,k) 

ti£sub(t) 


root of t is a MAX node 


max,,,*, biased-vm(ti , v, k) 

ti£sub(t) 



Fig. 13. Iterative biasing, as carried out by the iterative vector minimaxing 
algorithm 



Iterative biasing enables the ivm algorithm to tackle non-locality by, on each 
iteration, introducing a dependency between all MAX selections in a tree. To see 
that ivm can correctly analyse problems that vector-mm cannot, simply consider 
again the tree of Figure El For any of the guesses {1,1}, {1,2}, or {1,3}, ivm 
will return an optimal strategy. Each guess results in a different optimal strategy, 
however, that wins in just one world (wi, W 2 , or W 3 , respectively). 

For trees with binary payoffs, it is always possible to construct the simple 
set S = {{1, 1}, {1, 2}, • • • , {1, n}}, which guesses the value v = 1 for each of the 
n possible worlds. For games where the payoffs can take more than two values, 
however, we suggest the more general S = {{v max , 1}, { v max , 2}, ■ ■ • , {v max , n}}. 
Here, v max is the largest of the (perfect information) minimax values of the 
root of the game tree in each individual world (such values can be efficiently 
calculated, as in the first step of the prm algorithm, for example). The value 
of v m a X is also an upper bound on the value of any entry in the optimal payoff 
vector, K max . Thus, such payoff guesses are appropriate for Bridge, where a 
common task is to identify the strategy with the best chance of producing the 
maximum possible number of tricks. In fact, a simple efficiency improvement can 
be made by omitting any guess {v m ax,k} for which the (perfect information) 
minimax value of the game tree in world is less than v max . This is justified 
by noting that the value of K max [/c] will never be v max if even the best possible 
play in w^, itself cannot produce v max tricks. 

5.3 Summary of Heuristics 

We have introduced the heuristics of beta-reduction and iterative biasing, demon- 
strating how they address the problem of non-locality by introducing dependen- 
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cies between choices at MAX nodes. We described how each heuristic repre- 
sents an improvement over the basic vector-mm algorithm, and also noted that 
beta-reduction could be combined with payoff-reduction to produce the prm- 
beta algorithm. In fact, there are a total of eight possible algorithms that can 
be produced by combinations of payoff-reduction, beta-reduction, and iterative 
biasing, as shown in Figure m 



Basic 

algorithm 

Add one 
heuristic 

I 

Add two 
heuristics 

I 

Add three 
heuristics 



vector-mm 




Fig. 14. Possible combinations of heuristics 



In the following section we present test results that demonstrate the practical 
use of these algorithms. First, however, we give a further intuition on their char- 
acteristics by examining how they perform on the tree used for our complexity 
proof in Figurc0 The summary in Figure HTI detalls the node selections made on 
this tree by each of the eight algorithms. The original vector-mm algorithm can 
at best find a 1-clique, if it makes a fortunate guess at node V3. Payoff-reduction 
cannot improve on this, as the minimax value of each individual world is one. 
However, both beta-reduction and iterative biasing improve the result, and when 
they are used together the optimal solution is found. 

6 Test Results 

We tested the algorithms in Figure O on random game trees and on a hard set 
of problems from the game of Bridge. The results of these tests are presented 
below. 



6.1 Experiments on Random Trees 

We follow the approach of 0 in conducting tests using complete binary trees, 
with n = 10 worlds and payoffs of just one or zero. These payoffs are assigned 
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Algorithm 


Nodes selected 


Equivalent clique 


vector-mm 


Vs selected with 50% probability. 


1-clique or 0-clique 


prm 


Same as vector minimaxing, since mini- 
max value is one in every world. 


1-clique or 0-clique 


vm-beta, 

prm-beta 


If V3 is selected (again, a 50% chance) V4 
is also selected. If V3 is not selected, V4 
and V5 are selected. 

(If tree is analysed right to left, v\, V2 and 
V3 are selected.) 


2-clique 

(3-clique) 


ivm , iprm 


For any payoff guess of 1 in wt, the cor- 
responding Vk will be selected. For the 
guesses k= 1, k = 2 , and k = 4 there is a 
50% chance that V3 may also be selected. 


2-clique or 1-clique 


ivm-beta, 

iprm-beta 


Vi, V2 and V3 only selected if the payoff 
guess is a 1 in world wi . All other guesses 
lead to the selection of just two nodes. 


3-clique 



Fig. 15. Performance comparison on tree generated by complexity proof (see 
Figure |3) 



by an application of the Last Player Theorem na, so that the probability of a 
forced win for MAX in the complete information game tree in any individual 
world is the same for trees of all depths. The game trees are further modified by 
a probability, q , that determines how similar the possible worlds are. To generate 
a tree with n worlds and a given value of q: 

— first randomly generate payoffs for n worlds, then 

— generate a set of payoffs for a dummy world w„ + i, 

— and finally, for each of the original n worlds, overwrite the complete set of 
payoffs with the payoffs from the dummy world, with probability q. 

Trees with a higher value of q tend to be easier to solve, because an optimal 
strategy in one world is also more likely to be an optimal strategy in another. In 
Figure^|we show the results obtained when q = 0.75 — a value we chose because 
it produces similar results to the game of Bridge. This graph was produced by 
carrying out the following steps 1000 times for each data point of tree depth and 
opponent knowledge: 

1. Generate a random test tree of the required depth. 

2. Use each algorithm to identify a strategy. We assume that, for each algo- 
rithm, the payoffs in all worlds can be examined. 

3. Compute the payoff of the selected strategies under the best defence model. 

4. Use an inefficient, but correct, algorithm (based on examining every strategy) 
to find an optimal strategy and payoff. 
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5. For each algorithm, check whether they are in error ( i.e ., if any of the values 
of the strategies found in Step 3 are inferior to the value of the strategy 
found in Step 4, assuming equally likely worlds). 



Error rate when opponent knowledge is 1 and q=0.75 




Fig. 16. Algorithm performance on random game trees where the optimal strat- 
egy in one world is more likely to be optimal in another 



The graph for vm-beta shows that whilst it improves on simple vector min- 
imaxing, the improvement is not as large as that produced by ivm or prm. 
However, a combination of heuristics performs better, with prm-beta performing 
at 100%. 



6.2 Experiments on the Game of Bridge 

Bridge has been heavily analysed by human experts, who have produced texts 
that describe the optimal play in large numbers of situations. The availability of 
such references provides a natural way of assessing the performance of automated 
algorithms. In fact, it was careful scrutiny of such expert analyses that led to 
the formalisation of the best defence model; thus, the game provides an excellent 
test of the algorithms in this paper. 

To construct a Bridge test set, we used as an expert reference the Official 
Encyclopedia of Bridge, published by the American Contract Bridge League 
JI]. This book contains a 55-page section presenting optimal lines of play for 
a selection of 665 single-suit problems. Of these, we collected the 650 examples 
that gave pure strategies for obtaining the maximum possible payoff against best 
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defence^ Using the Finesse Bridge-playing system □3, we then tested Monte- 
carlo sampling, and the eight algorithms summarised in the previous section, 
against the solutions from the Encyclopedia. (For the iterative algorithms, the 
set of payoff guesses was produced by finding the worlds for which the perfect 
information minimax value of the game tree was equal to the maximum possible 
number of tricks, as described at the end of El) 

We compared the expected payoff of the strategies produced by each algo- 
rithm (for the maximum possible number of tricks) with the expected payoff 
of the solution given in the Encyclopedia. The results summarised in Figure [d 
show how often each algorithm was optimal. As in our tests on random trees, 
vector minimaxing is again slightly more accurate than Monte-carlo sampling, 
and correctness further improves as heuristics are added. The most effective in- 
dividual heuristic is payoff-reduction ( prm outperforms both ivm and vm-beta). 
When payoff-reduction, beta-reduction and iterative biasing are all combined 
in the iprm-beta algorithm, sub-optimal strategies are only generated for two 
problems. Given that our algorithms also revealed nine errors in the test set it- 
self, however, this performance (and also that of ivm-beta and iprm ) is actually 
better than the human experts that produced the model solutions. In fact, we 
traced the cause of iprm-beta’ s two errors to a problem with Finesse itself that 
resulted in the optimal strategies not actually being present in the search space. 
We intend to correct this design error in the near future. 



Algorithm 


Optimal 


Sub-optimal 


Expected Loss 


Time (s) 


Monte-carlo 


430 (66.2%) 


220 (33.8%) 


17.00 


8.1 


vector-mm 


460 (71.8%) 


190 (28.2%) 


12.81 


3.8 


vm-beta 


555 (85.4%) 


95 (14.6%) 


6.24 


4.3 


ivm 


613 (94.3%) 


37 (5.7%) 


1.61 


25.5 


prm 


622 (95.7%) 


28 (4.3%) 


0.86 


15.5 


prm-beta 


638 (98.2%) 


11 (1.8%) 


0.34 


19.6 


ivm-beta 


645 (99.2%) 


5 (0.8%) 


0.23 


96.3 


iprm 


645 (99.2%) 


5 (0.8%) 


0.13 


104 


iprm-beta 


648 (99.7%) 


2 (0.3%) 


0.06 


101 



Fig. 17. Performance on the 650 single-suit problems from the Encyclopedia of 
Bridge 



1 The remaining fifteen examples split into four categories: six problems that give no 
line of play for the maximum number of tricks, four problems involving the assump- 
tion of a mixed strategy defence, four for which the solution relies on assumptions 
about the defenders playing sub-optimally by not false-carding, and one where there 
are constraints on the cards that can be played. 
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Our results table also includes an ‘Expected Loss’ column, which gives the 
number of times that the sub-optimal strategies produced by each algorithm can 
be expected to result in inferior performance. This figure measures the expected 
number of times that the Encyclopedia’s strategies would out-perform each al- 
gorithm when playing the entire set of 650 problems once (against best defence 
and with a random choice among the possible holdings for the defence). The 
value is produced by simply summing, over every problem in the test set, the 
chance of success of the Encyclopedia’s strategy minus the chance of success of 
the strategy produced by the algorithm in question. When measured by expected 
loss, the superiority of iprm-beta over Monte-carlo sampling or vector-mm is less 
marked. However, note that there is at least one task for which optimality is the 
crucial factor, namely the creation of tutoring systems where a computer must 
generate (and perhaps even explain) the best way to play a game. One natural 
application of iprm-beta , therefore, is as the basis for such a system. 

The ‘Time’ column gives the average number of seconds required for a single 
problem (on a Sun SPARCstation Ultra2 running at 200MHz). We have not 
paid particular attention to the efficiency of our implementations (for example, 
none of the beta-reduction algorithms actually incorporate pruning to speed up 
the search). Nevertheless the speeds are acceptable, with prm-beta , in particular, 
offering a good trade-off of accuracy against speed. The iterative algorithms may 
appear particularly slow, but note that they can all be used in ‘any-time’ fashion, 
returning the best result encountered so far when available time is exhausted. 

7 Conclusions 

We have investigated the problem of finding optimal strategies under the best 
defence model of an imperfect information game. We demonstrated that this 
problem is NP-complete in the size of the game tree, and introduced the new 
heuristics of beta-reduction and iterative biasing. We presented test results that 
demonstrated the effectiveness of these heuristics, particularly when combined 
with payoff-reduction minimaxing to produce the iprm-beta algorithm. On our 
database of problems from the game of Bridge, iprm-beta actually makes less 
errors than the human experts that produced the model solutions. It thus rep- 
resents the first general search algorithm capable of consistently performing at 
and above expert level on a significant aspect of Bridge card play. 
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Abstract. In this contribution we propose a strategy which focuses on 
the game as well as on the opponent. Preference is given to the thoughts 
of the opponent, so that the strategy might be speculative. We describe 
a generalization of OM search, called (D, d)-OM search, where D stands 
for the depth of search by the player and d for the opponent’s depth 
of search. A difference in search depth can be exploited by deliberately 
chosing a suboptimal move in order to gain a larger advantage than when 
playing the optimal best move. The idea is that the opponent does not 
see the variant in sufficiently deep detail. Simulations using a game-tree 
model including an opponent model as well as experiments in the domain 
of Othello confirm the effectiveness of the proposed strategy. 
Keywords: opponent modelling, speculative play, a-/3 2 pruning, Oth- 
ello. 



1 Introduction 

In minimax and its variants there is an implicit assumption that the player and 
the opponent use the same search strategy, i.e. , (1) the leaves are evaluated by 
an evaluation function and (2) the values are backed up via a minimax-like pro- 
cedure. The evaluation function may contain all kind of sophisticated features 
but it evaluates the position according to preset criteria (including also the use 
of quiescence search). It never changes the value of a Knight in the evaluation 
function, although it “knows” that the opponent has a strong reputation for 
playing with two Knights in the endgame. So, the evaluation function shows sta- 
bility and is not speculative. The minimax back-up procedure is well-established 
and is as logical as one can think. So far no other idea emerged, except for one 
final decision of the back-up procedure. If the result is a draw (e.g., by repetition 
of positions) and the opponent is assumed to be weak, a contempt factor may 
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indicate that playing the second-best move is preferred. This is the most elemen- 
tary step of opponent modelling. It shows a clear deviation of the minimax-like 
strategy. 

An extension of the idea of anticipating the opponent’s weakness has been 
developed in opponent-model search. According to this framework a grandmas- 
ter often attempts to understand the intention behind the opponent’s previous 
moves and then employs some form of speculative play, anticipating the op- 
ponent’s weak reply if)]. Iida et al. modelled such grandmaster’s thinking pro- 
cesses based on possible opponent’s mistakes, and proposed OM search (short for 
Opponent-Model search) pTl r )) as a generalized game-tree search. In OM search 
perfect knowledge of the opponent’s evaluation function is assumed. This knowl- 
edge may lead to the conclusion that the opponent is expected to make an error 
in a given position. As a consequence the error may be exploited to the advan- 
tage of the player possessing the knowledge. In such an OM-search model, it is 
implicitly assumed that both players search to the same depth. 

In actual game-playing, e.g., in Shogi tournaments, we have observed [5{ that 
the two players may not only use different evaluation functions, but also reach 
different search depths. Therefore, we propose a generalization of OM search, 
called (D, d)-OM search, in which the difference of depth is incorporated, with 
D standing for the depth of search of the first player, and d for the opponent. 
We will show that exploiting this difference leads to a speculative strategy. 

In section 2 we introduce ( D , d)-OM search by some definitions and assump- 
tions and describe a ( D , d)-OM-search algorithm. Then the characteristics of 
( D , d)-OM search are considered in section 3, and the relationship between 
(D, d)-OM search, OM search, and minimax search is discussed. In section 4, 
an improved version in which branches are pruned is introduced. It is denoted 
by a-/3 2 pruning. Section 5 illustrates the performance of the implicitly proposed 
speculative strategy with random-tree simulations as well as with experiments 
in the domain of Othello. How to apply this strategy efficiently to actual game 
positions is discussed in section 6. Finally, conclusions and limitations of this 
speculative strategy are given in section 7. 



2 ( D , d)-OM Search 

In this section, ( D , d)-OM search is outlined by definitions and assumptions. 
In addition, an example is supplied showing how a value at any position in a 
search tree is computed using (Z3,d)-OM search. By convention and for clarity, 
the two players are distinguished as the max player and the min player. Below, 
we discuss (D, d)-OM search from the viewpoint of the max player. 



2.1 Definitions and Assumptions 

For the description of (D, d )- OM search, we use the following definitions and 
assumptions. 
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Definition 1. [playing strategy] 

A playing strategy is a three-tuple ( D,EV , SS), where D is the player’s search 
depth, EV the static evaluation function used, and SS the search strategy, i.e., 
the method of backing up the values from the leaves to the root in a search tree. 



Definition 2. [player model] 

A player model is the assumed playing strategy of a player. For any player X 
with search depth Dx, static evaluation function EVx, and search strategy SSx, 
we define a player model as Mx = (Dx, EVx , SSx) ■ 

Below we provide three assumptions for (D,d)~ OM search. In the following 
OM stands for OM search, MM for minimax search, and P is a given position 
in which the max player is to move. 



Assumption 1 

The min player’s playing strategy M m i n is defined as (d, EV m i n , MM), which 
means that the min player performs some minimax strategy at any successor of 
P and evaluates the leaf positions at depth (d + 1) in the max player’s game-tree 
using the static evaluation function EV m i n . 



Assumption 2 

The max player knows the strategy of the min player, M m i n = (d, EV m i n , MM), 
i.e., his min player’s model coincides with the min player’s strategy. 



Assumption 3 

The max player employs ( D , EV max , ( D , d)-OM) as playing strategy, which means 
that the max player evaluates the leaf positions at depth D using the static eval- 
uation function EV max and backs up the values by (D,d)-OM search. 



( D , d)- OM search mimics grandmaster play in that it uses speculations on 
what the opponent “sees” . The player acquires and uses the model of the oppo- 
nent to find a potential mistake, and then obtains an advantage by anticipating 
this error. 



2.2 The Algorithm of (D,d )- OM Search 

In ( D , d)-OM search, a pair of values is computed for all positions above depth 
(d+1). One value comes from the opponent model and one from the max player’s 
model. Below depth (d+ 1), the max player no longer uses any opponent model. 
There only one value is computed for each position; it is backed up by minimax 
search. 
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Let i, j from now on range over all immediate successor positions of a node 
in question. Let a node be termed a max node if the max player is to move, a 
min node otherwise. According to the assumptions, D is the search depth of the 
max player and d is the search depth of the min player as predicted by the max 
player. Then the function V(P 1 OM(D,d)) is defined for relevant nodes as the 
value considered by the max player, and V(P, MM(d)) as the value for the min 
player, predicted by the max player. 



' max V(Pi,OM(D - 1 ,d- 1)) 

i 

if P is an interior max node 



1)) 

(1) 

min V(Pi,OM(D - l,d-l)) 

i 

if P is an interior min node 
and d < 0 

EVmax{P ) 

if D = 0 (P is a leaf node) 



V(P,OM(D,d)) = < 



V(Pj,OM(D — l,d— 1)) with j such that 
V(Pj,MM(d- 1)) = min V(P h MM(d- 

i 

if P is an interior min node 
and d > 0 



V(P,MM(d )) = 



' max V(Pj, MM(d — 1)) 

i 

if P is an interior max node 
min V (Pi, MM(d — 1)) 

i 

if P is an interior min node 



EVmin(P) 

if d = — 1 (P is a “leaf” node) 



( 2 ) 



The pseudocode for the ( D , d)-OM search algorithm is given in Figured 

An example of (D, d)-OM search is shown in Figured The search tree shows 
two different root values due to the use of two different models of the players. 
Using (3, l)-OM search yields a value of 11 and using plain minimax yields a 
value of 9. In this example, the max player may thus achieve a better result than 
by minimax. It does so by selecting the left branch. For clarity, we note that d 
denotes the search depth for the opponent, which is reached at depth d + 1 in 
the search tree of the first player. In the example, the nodes at depth 2 thus will 
be evaluated for both players, while those at depth 3 will only be evaluated for 
the first player. 
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procedure ( D , d)-OM(P, depth): 

/* Iterative deepening at root P * / 

/* Two values are returned, according to equations (2) and (1) */ 

if depth = d + 1 then begin 
/* Evaluate the min-player's leaf nodes */ 

V_MM[P] <- Evaluate(P,min) 

V_OM[P] 4— Minimax(P, depth) 
return (V_MM[P],V_OM[P]) 

end 

{Pi\i = 1, • • • , n} 4— Generate(P) 

/* Expand P to generate all its successors Pi */ 

for each Pi do begin 

(V_M M [Pi], V.OM [Pi]) «- (D,d)-OM(Pi,depth+l) 

end 

/* Back up the evaluated values */ 
if P is a max node then begin 

/* At a max node both the max player and the min player back up the maximum */ 

V_MM[P] 4- max V_MM[Pi] 

l<i<n 

V_OM[P] <- max V.OM [Pi] 

l<i<n 

end 

else begin /* P is a min node */ 

/* At a min node, the min player backs up the minimum and the max player backs up the value of the 
node selected by the min player */ 

V_MM[P] <- V_M M [Pj ] = min V_MM[Pi] 

l<i<n 

V_OM[P] 4 - V_OM[Pj] 

end 

return (V_MM[P],V_OM[P]) 

procedure Minimax(P, depth): 

/* Iterative deepening below depth d + 1 */ 

/* Returns the minimax value according to the max player */ 

if depth = D then begin 

/* Evaluate the max player's leaf nodes */ 

V_MM[P] <- Evaluate(P.max) 
return (V_MM[P]) 

end 

{P i \i = 1, • • • , n} ■<— Generate(P) 

/* Expand P to generate all its successors Pi */ 

for each Pi do begin 

V_MM[Pi] Minimax(Pi,depth+l) 

end 

/* Back up the evaluated values */ 
if P is a max node then begin 

V_MM[P] 4- max V_MM[Pi] 

l<i<n 

end 

else begin /* P is a min node */ 

V_MM[P] 4- min V_MM[Pi] 

l<i<n 

end 

return (V_MM[P]) 

Fig. 1. Pseudocode for the ( D , d)-OM search algorithm. 
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□ max node 
o min node 




The numbers inside the circles/boxes represent the back-up values by minimax search 
from the max player’s point of view. The upper numbers beside the circles/boxes repre- 
sent the back-up values by (3, l)-OM search and the lower numbers the back-up values 
of the minimax search from the min player’s point of view. The depths 3 and 2 contain 
the leaf positions for the max player and the min player, respectively, i.e. , these values 
(in italics) are evaluated statically using the max player’s or the min player’s evaluation 
function. 

Fig. 2. ( D , d)-OM search and minimax compared, with D = 3 and d = 1. 



We remark that the player using (D,d )~ OM search always searches deeper 
than the opponent, i.e., D > d. Cases in which the opponent is modelled by 
a deep search using a very fast but simplistic evaluation function, and the first 
player is modelled as relying on a shallower search with a sophisticated evaluation 
function, are not treated in the above formulation. 



3 Characteristics of ( D , d)-OM Search 

In this section, some characteristics of (£), rf)-OM search are described and com- 
pared with those of minimax search. At first the relations among (Z), d)-OM 
search, OM search, and minimax search are discussed, and two remarks are 
made. Then a theorem relating root values by (Z),(i)-OM search and minimax 
search is stated. 

3.1 Relations among (D, rf)-OM Search, OM Search, and Minimax 

The ( D , d)-OM search algorithm indicates that the max player performs minimax 
search to back up the static-evaluation-function values from depths (d + 1) to 
D , while from depths 1 to (d + 1) the max player performs pure OM search. So 
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from the viewpoint of search algorithms, ( D , d )- OM search can be considered as 
the combination of pure OM search and minimax search. 

Viewed differently, all the moves determined by minimax search, OM search, 
and (D,d )~ OM search take some opponent model into account, i.e., each choice 
is based on the player’s own model and some opponent model. Accordingly, all 
three strategies can be considered as opponent-model-based search strategies. 
The difference among them lies in the specification of the opponent model. 

The opponent models used by the max player in minimax search, OM search, 
and ( D , d)-OM search are listed in Tabled We assume that the max player moves 
first with search depth D and evaluation function EV ma x, he., in a game tree 
the root is a max position. 



Algorithm 


the opponent model 


minimax search 


(D — 1, EVmax, MM) 


OM search 


{D — 1, EVmin, MM) 


( D , d)-OM search 


( d , EVrnini 


MM) 



Table 1. The opponent models used in minimax search, OM search, and ( D , d)- 
OM search. 



TableQJshows that OM search is a generalization of minimax search (in which 
the opponent does not necessarily use the same evaluation function as the max 
player), and (-D,d)-OM search is a generalization of OM search (in which the 
opponent does not necessarily search to the same depth as the max player) . This 
is more precisely formulated by the following two remarks. 

Remark 1 

( D , d)-OM search is identical to OM search when d = D — 1. 



Remark 2 

( D , d)-OM search is identical to minimax when d = D — 1 and EV m i n = EV m ax . 

Therefore, of the opponent models used in the three search algorithms, the 
one in (D, d)-OM search has the highest flexibility due to the smallest limitation 
of the opponent’s choice about search depth and evaluation function. So, ( D , d)~ 
OM search is the most universal mechanism of the three, and has in principle 
the largest ability for practical use. 

3.2 A Theorem on Root Values 

Based on the different back-up procedures of the evaluation-function values, the 
following theorem can be proven. 
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Theorem 1. 

For the root position R in a game tree we have the following relation: 

V(R, OM(D , d)) > V(R, MM(D)), (3) 



where V(R 1 OM(D,d)) denotes the value at root R by (D,d)~ OM search and 
V(R, MM(D )) that by minimax search with search depth D. The theorem is 
proven by induction on the level in the game tree. 

The above theorem implies that if the max player has a perfect opponent 
model, (D,d)~ OM search based on such a model can enable the max player to 
reach a position that may be better, but should never be worse, than the one 
yielded by the minimax search. In other words, we face the common assumption: 
the deeper the search, the higher the playing strength. 

4 ck-/ 3 2 Pruning ( D , d)-OM Search 

In this section, we introduce an efficient variant of ( D , d)- OM search, called a-/3 2 
pruning (D,d)-OM search. 

As is well known, the number of nodes visited by a search algorithm in- 
creases exponentially with the search depth. This obviously limits the scope 
of the search, especially because game-playing programs have to meet exter- 
nal time constraints. Since the minimax search was introduced to game-playing, 
many techniques have been proposed to speed up the search process, such as the 
general a-/3 pruning El, the null-move method for chess [TJ and ProbCut for 
Othello |2J. On the basis of a-f3 pruning, Iida et al. proposed /3-pruning as an 
enhancement for OM search dj . 

(D, d)-OM search backs up the static-evaluation-function values from depths 
D to (d+ 1) with minimax, and from depths (d+ 1) to the root with OM search. 
Hence it is possible to split up (D, d)-OM search into two parts, and then achieve 
a search speed-up in both parts separately. To guarantee the generality, we choose 
a-/3 pruning to speed up the minimax part, and /3-pruning for the OM-search 
part. The whole algorithm is named a-/? 2 pruning. 

For details about a-/3 UU and (3 pruning [3J, we refer to the literature. Pseu- 
docode for the a-/? 2 algorithm is given in Figures 0 and El 

We note that in the M* algorithm, the multi-model-based search strategy 
developed by Carmel and Markovitch 0, a similar pruning mechanism was de- 
scribed as our a-/3 2 -pruning. However, due to their recursive application of op- 
ponent modelling their pruning is not guaranteed to yield always the same result 
as the non-pruning analogue. Only when the evaluation functions for both play- 
ers obey certain conditions, in particular when they do not differ too much, the 
correctness of their a/3* algorithm is proven. 
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procedure a-/3 2 (P,a,/tdepth): 

/* Iterative deepening at root P * / 

/* Two values are returned, according to equations (2) and (1) */ 

if depth = d + 1 then begin 
/* Evaluate the min-player’s leaf nodes */ 

V.MM[F] <— Evaluate(P,min) 

V_OM[P] — a-P{P,a,p,d+ 1) 
return (V_MM[P],V_OM[P]) 

end 

{Pi\i = 1, • • • , n} <— Generate(P) 

/* Expand P to generate all its successors Pi * / 

for each Pi do begin 

(V_M M [Pi] ,V_OM [P]) 4 - a-/3 2 (P,a,V_MM[P],depth+l) 
if P is a max node then begin 

/* /J-pruning at the max node */ 
if V_MM[P] >= p then begin 
return (V_MM[P],V_OM[P]) 

end 

end 

end 

/* Back up the evaluated values */ 
if P is a max node then begin 

/* At a max node both the max player and the min player back up the maximum */ 
V_MM[P] <- max V_MM[P] 

l<i<n 

V_OM[P] <- max V_OM[P ; ] 

1 <i<n 

end 

else begin /* P is a min node */ 

/* At a min node, the min player backs up the minimum and the max player backs 
up the value of the node selected by the min player */ 

V_MM[P] — V_MM[P,] = min V_MM[P] 

l<z<n 

V_OM[P] V.OM^ ] 

end 

/* Update the the value of (3 * / 

p <- V_MM[P] 

return (V_MM[P],V_OM[P]) 

Fig. 3. Pseudocode for the /^-pruning part of the a-/3 2 algorithm. 



5 Experimental Results of ( D , d)-OM Search 

In this section, we describe two experiments on the performance of ( D , d)-OM 
search, one with a game-tree model including an opponent model and the other 
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in the domain of Othello. The main purpose of these experiments is to confirm 
the effectiveness of the proposed speculative strategy when a player has perfect 
knowledge of the opponent model. 

procedure a-/3(P,a,/3,de pth): 

/* Iterative deepening below depth d+1 */ 

/* Returns the minimax value according to the max player */ 

if depth = D then begin 
/* Evaluate the max player’s leaf nodes */ 

V_MM[P] <— Evaluate(P,max) 
return (V_MM[P]) 

end 

{Pi\i = 1, • • • , n} <— Generate(P) 

/* Expand P to generate all its successors Pi * / 

for each Pi do begin 

V_MM[P;] <- a-p(Pi,a,/3, depth+1) 
if P is a max node then begin 
if V_MM[P] > a then begin 
a <- V_MM[Pi] 

end 

if a >= (3 then begin 
return (a) 

end 

end 

else begin /* P is a min node */ 
if V_MM[P] < p then begin 
P <- V_MM[P ] 

end 

if a >= P then begin 
return ( P ) 

end 

end 

end 

/* Back up the evaluated values */ 
if P is a max node then begin 
V_MM[P] — max V_MM[Pi] 

l<i<n 

end 

else begin /* P is a min node */ 

V_MM[P] — min V_MM[Pi] 

l<i<n 

end 

return (V_MM[P]) 

Fig. 4. Pseudocode for the a-p- pruning part of the a-/3 2 algorithm. 
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5.1 Experiments with Random Trees 

In order to investigate the performance of a search algorithm, a number of game- 
tree models have commonly been used D2E33- However, for OM-like algorithms 
we need a model including an opponent model. Iida et al. have proposed a 
game-tree model to measure the performance of OM search and tutoring-search 
algorithms 0- On the basis of this model, we build another game-tree model 
including the opponent model to estimate the performance of ( D , d)-OM search. 
As a measure of performance, we use the H value of an algorithm like we did 
for OM search. With this game-tree model and the H values, the performance 
of ( D , d)-OM search is studied. 



Game-Tree Model The game-tree model we use for this experiment is a uni- 
form tree. A random score is assigned for each node in the game tree and the 
scores at leaf nodes are computed as the sum of numbers on the path from the 
root to the leaf node. This incremental model was also proposed by Newborn 
m and goes back to a scheme proposed by Knuth and Moore m ■ The max 
player’s score for a leaf position at depth D (say P D ) is calculated as follows: 

D 

EV max {P D ) = Y j r(P k )\ ( 4 ) 

/c— 0 

the min player’s score for a leaf position at depth (d+1) (say P d+1 ) is calculated 
as follows: 

d + 1 

EV min (P d+1 ) = J2r(P k ), ( 5 ) 

k - 0 

where —R < r(-) < R , and r(-) has a uniform random distribution and R is 
an adjustable parameter. The resulting random numbers at leaf nodes have a 
normal distribution. Note that the min player uses the same random score r(-) 
as the max player. It is implied that EV max = EV m i n when D = d + 1. In this 
case, (D, d)- OM search is identical to the minimax strategy according to Remark 
2 . 

This game-tree model comes close to approximating the parent/child be- 
haviour in real game trees and reflects a game tree including models for both 
players, in which different opponent models are simulated by various search 
depths d. For this game-tree model, we recognize that the strength of the min 
player is equal to that of the max player when d = D — 1 and that the min 
player has less information from the search tree about a given position when 
d < D — 1. Note that we only investigate positions for which d < D — 1, since 
otherwise (D, d)- OM search is unreliable and should not be used. 



H Value In order to estimate the performance of ( D , d )- OM search we define 
the so-called H value ( Heuristic performance value ) for the root R by 



H{R) 



V(R, OM(D,d))-V min (R,D) 

Vmax (R,D) Vmin (R,D) 



( 6 ) 
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Here,F(I?, OM{D , d )) represents the value at R by ( D , d)-OM search. V m i n (R, D) 
is given by 



V m i n (P , D) = min EVmax(Pi), Pi £ all the leaf nodes at depthD (7) 

i 

Vmax(P,D) is similarly given by 

V max (P, D) = ma x EV max (Pi) , Pi £ all the leaf nodes at depthD (8) 

i 

The procedure indicated by © obtains the minimum value of the root R by 
looking ahead D plies and the strategy indicated by (0 analogously the max- 
imum value. H(R) then represents the normalized performance of (D, rf)-OM 
search and can be thought of as a characteristic of the strategy. Although the 
value of this performance measure remains to be proven, we feel that the scal- 
ing applied by using the minimum and maximum values of the leaves sets the 
resulting performance in appropriate perspective. 

Preliminary Results on the Performance of (D, d)-OM Search To get 

insight in the performance of (D, d)-OM search, several preliminary experiments 
were performed using the game-tree model proposed above. 

In a first experiment, we observed the performance of ( D : d)-OM search for 
various values of d. In this experiment, D is fixed at 6 and 7, and d ranges from 
0 to D — 1. A comparison of (6, d)-OM search and minimax search is presented 
in Figure while (7, d)-OM search and minimax search are compared in Figure 
0 all with a fixed branching factor of 5. All curves shown in Figures0ancl0are 
averaged results over 100 experiments. 




d 



Fig. 5. (6, d)-OM search and minimax compared. 



Figures 0 and 0 show that 

— the results are in accordance with Theorem 0 and Remark 2. In particular, 
• d = 0 means that the opponent does not perform any search at all. The 
max player therefore has to rely on minimax. 
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d 

Fig. 6. (7, d )- OM search and minimax compared. 



• when d = 5 in Figure 0 and d = 6 in Figure 1771 i.e. , d = D — 1, the 
min player looks ahead to the same depth in the search tree as the max 
player. In this case, the max player actually performs pure OM search. 
Since EV max (P) = EV m i n (P) in our experiments, the conditions laid 
down in Remark 2 are fulfilled, and (D,d )~ OM search is identical to 
minimax. 

— the fluctuation in H values of ( D , d)-OM search for depths d from 1 to D — 1 
hardly seems dependent on the value of d. This is explained by the fact 
that the ratio of mistakes of OM search does not depend on the depth of 
search, but only on the branching factor 0. The results may suggest that the 
fluctuation in H values of (D,d)-OM search has a maximum at d = [D/ 2J. 



In a second experiment, we investigated the performance of ( D , d )- OM search 
for various values of D. In the experiment, d is fixed at 2 and D ranges from 
3 to 7. The results are shown in Figure 0 which is an averaged result over 100 
experiments, again using a branching factor of 5. 




D 

Fig. 7. ( D , 2)-OM search and minimax compared. 
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Figure Q tells us that the H value of (D,d )~ OM search is greater than that 
of D-minimax. Of course, the gain of ( D , d)-OM search over D-minimax is very 
small, since d is fixed at 2, which means that OM search is only performed in the 
upper 2 plies, whereas in the remainder of the search tree minimax is performed. 
In addition, (£),d)-OM search and D-minimax show the same fluctuation in H 
values, a consequence of both using the same evaluation function. 

5.2 Othello Experiments 

In the subsection above, the advantage of ( D , d )- OM search over D-minimax 
has been verified with random-tree-model simulations. However, simulating tree 
behaviour is fraught with pitfalls m So, now let us turn to the study of effec- 
tiveness of the proposed speculative strategy in real game-playing. Due to the 
simple rules and relatively small branching factor, Othello is selected as a test 
bed. We assume that the rules of the game are known. In determining the final 
score of a game we adopt the convention that empty squares are not awarded to 
any side. The concept net score is used as the difference in number of stones of 
a finished game, e.g., in a game with final score 38-25 the first player has a net 
score of 13. 



Experimental Design For easy comparison, we let program A with model 
Ma = ( D , EV, (D, d)- OM) and program B with model Mb = (D, EV, MM) play 
against program C with model Me = (d, EV, MM) . The results of A against 
C compared to those of B against C then serve as a measure of the relative 
strengths of ( D , d)- OM search and D-MM search. EV again denotes the evalu- 
ation function. To simplify the experiments, we do not consider the influence of 
the evaluation function for the moment, i.e. , we use the same evaluation function 
for programs A, B and C. 

In the experiments programs A and B search to the same depth D , whereas 
program C searches to depth d. The cases D = d+1, D = d- 1-2 and D = d + 3 
are investigated. 



Performance Measure Two parameters AS and R w are defined to estimate 
the performance of (D, d)-OM search and U-MM search. AS represents the 
average net score and R w denotes the winning rate of the player. For a given 
player A', AS( X) is given by 

23(*) = ^ E J2 AS i( x ) ( 9 ) 

je(B,w) *= i 

In this formula, ASf( X) denotes the net score obtained by player X when 
he plays with Black. Similarly, AS 'f' (A) is the analogous number for playing 
White, and 2 N represents the total number of games, equally divided over games 
starting with Black and with White. Therefore, this performance measure offsets 
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Programs 


Performance 


d 


Measure 


1 


2 


3 


4 




Scores 


37.4/26.6 


35.8/28.2 


38.8/25.0 


39.2/24.8 


A vs. C 


AS{ A) 


10.8 


7.6 


13.8 


14.4 




R w ( A) 


66% 


65% 


69.5% 


73.5% 




Scores 


37.4/26.6 


35.8/28.2 


38.8/25.0 


39.2/24.8 


B vs. C 


AS(B) 


10.8 


7.6 


13.8 


14.4 




R w ( B) 


66% 


65% 


69.5% 


73.5% 



Table 2. The results of programs A and B vs. program C, for D = d + 1. 



the influence caused by having the initiative, which in general is widely believed 
to be a decisive advantage in White’s favor. 

The winning rate of player A', R w ( X) is defined as, 

77 —1— 777 

R W (X) = - X 100%, (10) 

where n denotes the number of won games when X plays with White, and m is 
that when A' plays with Black. 

In our experiments, we let N = 50, i.e. , a total of 100 games are played for 
each case. 

Preliminary Results Table H shows the results for the case D = d + 1, where 
the average scores by 100 games are given in the format x/y , with x the number 
of stones obtained by the first player and y by the opponent. 

From Table 0 we see that programs A and B obtain identical scores against 
program C, in accordance with Remark 2, i.e., that in the case D = d+1 ( D , d )- 
OM search is identical to U-MM search. In addition, the results indicate that 
deepening search can confer some advantage. When D = d + 1, the average 
winning rate is approximately 68.5%. 

Table EJlists the results for the case D = d + 2, showing that the performance 
of ( D , g?)-OM search then always is significantly better than that of D-MM search 
by a small margin. 

We speculate that the edge of (fl,d)-OM search over D-MM search will 
increase with a better evaluation function (the present one mainly just counting 
disks). This is an area for future research. 

Table Ogives the results for the case D = d+ 3. Again it is clear that ( D , d)- 
OM search is stronger than D-MM search. However, when d = 3, although the 
winning rate of (D,d)-OM search is greater than that of D-MM search, the 
average net gain of (D, d)-OM search is surprisingly lower. We believe that this 
also is a result of the use of a simplified evaluation function. Comparing Tables 
12141 we also notice that the benefit of (D, d)-OM search over D-MM search grows 
with larger difference in search depth between the opponents. Obviously, OM 
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Programs 


Performance 


d 


Measure 


1 


2 


3 


4 




Scores 


39.9/24.1 


41.7/22.3 


41.2/22.8 


40.2/23.8 


A vs. C 


AS (A) 


15.8 


19.4 


18.4 


16.4 




R w ( A) 


75.5% 


78.5% 


79% 


76.5% 




Scores 


37.8/26.2 


39.7/24.3 


40.8/22.9 


39.9/24.1 


B vs. C 


AS(B) 


11.4 


15.4 


17.9 


15.8 




R w ( B) 


68.5% 


76% 


78% 


74.5% 



Table 3. The results of programs A and B vs. program C, for D = d + 2. 



Programs 


Performance 


d 


Measure 


1 


2 


3 




Scores 


43.9/20 


45.4/18.6 


42.1/21.9 


A vs. C 


AS( A) 


23.9 


26.8 


20.2 




R w ( A) 


88% 


88.5% 


94% 




Scores 


41.8/22.1 


43.7/20.3 


44.4/19.5 


B vs. C 


AS(B) 


19.7 


23.4 


24.9 




Rn{ B) 


85% 


86.5% 


90% 



Table 4. The results of programs A and B vs. program C, for D = d + 3. 



search is suited to profit as much as possible from defects in the evaluation 
function, which is precisely the reason why (D,d )~ OM search was proposed. 
Moreover, although the margins are small we see from Tables m that ( D,d)~ 
OM search always is as good as (when D = d + 1) or better (when D > d + 1) 
than minimax. We feel that the significance of this observation also depends on 
the evaluation function in use. This will be subject of future research. 



6 Applications of ( D , d)-OM Search 

Since ( D , d)-OM search stems from grandmasters’ experience, it is implied that 
the player using this strategy has a higher strength. Even then, a grandmaster 
employs only in some special cases (D, d)-OM search to get some advantage. 
These include the case that the opponent is really weak, and the case that 
the grandmaster reaches some weak position. Regarding the former, (D, d)-OM 
search can help the player win in fewer moves or with more gains. With respect 
to the latter, the grandmaster has to wait for mistakes by his opponent, in which 
case (£), d)-OM search can help him to change a situation. 
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6.1 The Requirements for Applying (D,d)- OM Search 

So far, we assumed that the max player’s static evaluation function EV ma x is 
possibly different from the min player’s one EV m i n . However, it is very difficult 
to have reliable knowledge of the opponent’s evaluation function to perform 
( D , d)-OM search. On the other hand, knowledge of the opponent’s search depth 
(especially when the opponent is a machine) may be more reliable. We therefore 
restrict ourselves in this section to potential applications of (D,d)~ OM search 
for the case EV max — EV m i n . 

Under this assumption the requirements for applying the proposed ( D,d)~ 
OM search can be given by the following Lemma. 

Lemma 1. 

Let 8 be the search depth difference between the max player and the min player 
in game-playing, i.e., 8 = D — d. If 8 > 2, then (D,d)-OM search can be applied. 

This means that the condition 8 > 2 gives the minimum depth difference at which 
it is beneficial to use ( D , d)-OM search over minimax in order to anticipate on 
the opponent’s errors resulting from its limited search depth. 

The detailed proof for the above lemma can be found in |5]. Furthermore, we 
can estimate in how many ways (D,d)- OM search can be applied. Each way of 
applying ( D , d)-OM search is completely defined by the players’ search depths 
D and d, where, for definiteness, D > d + 2 (from Lemma [Hand Definition EJ) . 
By simple discrete summation, we find for the number of ways, considering that 
the min player may, from instance to instance, choose any model with depth at 
most equal to d and since the max player may respond by choosing his D to 
match, that 



where N(D,d) denotes the number of ways of applying (D,d )- OM search. 

6.2 Possible Applications 

Since (D,d )- OM search is a speculative strategy, the reliability depending on 
the correctness of the model of the opponent, it may seem unlikely that such 
a strategy will be of much practical use in game-playing. However, there are 
several situations where such a strategy can be of significant support. 

One such possible application is in building a tutoring strategy for game- 
playing 0 In this case, compared with the pupil, the tutor can be considered 
as a grandmaster. It is essential, if tutoring is to be successful, that the tutor 
has a clear representation of his pupil. This statement is paramount in ranging 
tutoring strategies into the wider context of methods possessing a clear picture 
of their opponents. Tutoring strategies therefore are necessarily a special case 
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of models possessing an opponent model. The balance in tutoring strategies is 
delicate: on the one hand it is essential that the tutor has a good model of his 
opponent. Yet it is also required that the give-away move be not so obvious as to 
be noticeable by the person being tutored. Thereby, with the help of (D, d)-OM 
search, the game is manipulated in the direction of an interesting position from 
which the novice may find a good or excellent move “by accident” ; the novice’s 
interest in the game may increase, stimulating his progress on the way towards 
becoming a strong player. 

Another place of possible application is to devise a cooperative strategy for 
multi-agent games, such as soccer uni, 4-player variants of chess m and so 
on. In such games, (D, d )- OM search can be used by the stronger player to 
construct a cooperative strategy with his partner(s). Here, compared to the 
weaker partner (s), the stronger one is a grandmaster, who can apply (D, d)-OM 
search in order to model his partner(s) play f|. One large advantage of such 
cooperative strategies is that it is much easier to obtain a reliable partner model 
than an opponent model. 

7 Conclusions and Limitations 

In this paper, a speculative strategy for game-playing, (D, d)-OM search, is pro- 
posed using a model of the opponent, in which difference in search depths is 
explicitly taken into account. The algorithm and characteristics of this search 
strategy are introduced. A more efficient variation, named a — (3 2 . Experimental 
results with random-tree simulations and using Othello confirm its effectiveness. 

Although the opponent model used by (D, d)-OM search is more flexible than 
that by pure OM search, it is difficult to have a reliable estimate of the search 
depth and evaluation function of the opponent. Mostly, the max player will only 
have a tentative model of his opponent, and as a consequence this will lead to a 
risk if the model is not in accordance with the real opponent’s thinking process. 
Whereas preliminary experiments indicated that the applicability of OM search 
is greater for weaker opponents 0, more work will be needed to investigate 
whether this holds also for (D,d )~ OM search. 

Another point for future research is the recursive application of (D, d)-OM 
search, analogous to Carmel and Markovitch’ 0 M* algorithm. Suppose we use 
(4,l)-OM search. In the present implementation the algorithm uses 2-MM search 
to determine the Max player’s values at depth 2. A better exploitation of the 
opponent’s weakness would be to use (2,l)-OM search then. The computational 
costs for this extension should carefully be weighed against the benefits. 
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Abstract. Approaches to computer game playing based on (typically 
a — (3) search of the tree of possible move sequences combined with an 
evaluation function have been successful for many games, notably Chess. 
For games with large search spaces and complex positions, such as Go, 
these approaches are less successful and we are led to seek alternative 
approaches. 

One such alternative is to model the goals of the players, and their strate- 
gies for achieving these goals. This approach means searching the space 
of possible goal expansions, typically much smaller than the space of 
move sequences. 

In this paper we describe how adversarial hierarchical task network plan- 
ning can provide a framework for goal-directed game playing, and its 
application to the game of Go. 



1 Introduction 

Most approaches to computer game playing are based on game tree search and 
position evaluation functions (data driven approaches). Data driven approaches 
are appropriate for games with small branching factors, and for which it is pos- 
sible to accurately assign values to positions which indicate who is winning. 
While this approach has been very successful for many games including computer 
Chess, it has been less successful when applied to games with high branching 
factors and complex positions, such as Go or for games with a high degree of 
uncertainty such as Bridge. 

An alternative to the data driven approach is goal driven search in which 
a single agent tries to satisfy its goals in the game. Goal driven search has 
been extensively explored in the AI literature, in particular as Hierarchical Task 
Network (HTN) planning [23. ESj- When multiple agents need to be modeled 
and can compete against one another this becomes adversarial planning. This 
paper presents an adversarial planning architecture for performing goal driven 
reasoning in games and describes its application to the game of Go. 



H.J. van den Herik, H. Iida (Eds.): CG’98, LNCS 1558, pp. 93 111 ^1 1999. 
( c ) Springer- Verlag Berlin Heidelberg 1999 
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1.1 Goals and Data 

Within a specific game, move or action choices often depend upon the state of 
the game, the phase of the game (e.g. opening, endgame etc.), the future actions 
of the opponent, the ability of a player to follow up an action appropriately 
and many other diverse factors. It is these interacting influences on the choice 
and effect of moves which make games so fascinating for human players and so 
challenging for machines. 

In computer game playing there are two main approaches to making move 
choices: 

— Data Driven: At each step, rules, patterns or heuristics are applied to the 
game state to suggest useful moves. The resulting set of plausible actions is 
then evaluated using search in the tree of moves. Each move is played out 
in a world model followed by the possible responses of the opponent. The 
search continues until the leaves of the tree are reached!]] These leaf nodes 
are then evaluated and used to select one of the original plausible actions as 
the one which leads to the most desirable (by some measure) set of leaf states. 

— Goal Driven: During play, a goal driven system keeps a number of abstract 
goals in an agenda. The goals in the agenda represent the things the system 
would like to achieve in the short, medium and long term. To choose a move, 
goals are expanded into plans (which are conjunctions of goals at lower levels 
of abstraction) and eventually into concrete moves. Repeated decompositions 
form a plan for achieving the goal. 

In a data driven search tree, each node represents a possible game position 
and has one branch for every legal move in that position. By contrast, each node 
in a goal driven search tree represents a plan for achieving the top level goal with 
some parts still sketchy (abstract goals) and others fixed (concrete actions), and 
there is one branch at the node for every way the system suggests to further 
refine the plan. 

Which approach (goal driven or data driven) is most advantageous is heavily 
dependent upon the domain, in particular on the size of the data driven and goal 
driven search trees. In Bridge, for example, the locations of the cards are not in 
general known during play, which leads to a large space of possible card plays 
and therefore a prohibitively large data driven search tree. Frank (PHI) shows 
that a goal driven approach can very successfully play bridge; a relatively small 
number of operators is sufficient to describe all the relevant plays. 

1.2 The Search Space in the Game of Go 

The game of Go is considered by many to be the next great challenge for com- 
putational game playing systems. It presents new, significant and different chal- 
lenges to Chess which has been long been considered the “task par excellence” 

1 Which nodes are the “leaves” can be variously defined by a depth cut off point, 
quiescence, or further domain dependent heuristics. 
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for AI (Berliner 0). Go’s search space is both wider and deeper than that of 
Chess; the size of Go’s search space is estimated to be 10 170 states (cf. Chess 
« 10 50 ), games last approximately 300 moves (cf. Chess ss 80) and the branching 
factor at each turn is approximately 235 states (cf. Chess « 35). It is often hard 
to evaluate the relative strength of Go positions during play. We therefore expect 
that the brute-force game tree search which has been so effective for Chess will 
have much greater difficulty with Go. 

Due to space limitations, we do not include the rules of Go in this document. 
A good introduction to the game can be found in, for example, )3]. 

1.3 Approaches to Computer Go 

Although Go has received far less attention than Chess in terms of research, 
there have been many varied approaches to computer Go. A good summary can 
be found in ■ The programs which can play the whole game most successfully, 
such as Go4-| — b and Many Faces OF Go Pi are primarily data driven but also 
employ other techniques. Gogol {jjjjj is able to learn patterns for play and various 
non-symbolic techniques have been used to learn/evolve Go playing controllers 
(see f2Bj for more references). Important techniques have also been developed 
for playing part of the game, in particular focusing on “Life and Death” El, 
j2'8lj , and using combinatorial game theory to play the endgame m- 

Despite the success of current approaches there is recognition that there is 
still substantial room for improvement. The advantages mentioned in fjH| and 
earlier work pm along with the success of this approach in other domains 
(notably bridge JZ2j) suggest that a goal driven approach may be useful. It 
also has much psychological validity, since protocol analysis indicates that Go 
players consider few candidate moves, and concentrate on their own and on their 
opponents’ purposes PI- Finally, even in data driven approaches to computer 
Go, it is still necessary to consider high-level goals, for example in order to decide 
whether or not a satisfactory result has been achieved in life and death problems 
(e.g. some strings may be allowed to die but others must live) |2H]- 

1.4 Applications of Adversarial Planning 

Early attempts to use goal driven reasoning in adversarial domains include PI 
and p| • The work by Pitrat (P!) was extended by Wilkins in P! to produce the 
successful Paradise system for Chess. More recent work includes (battlefield 
management) and [31 )[ (command and control). The most recent work is by 
Smith et. al. on bridge (described in and IP) which presents a goal driven 
system for bridge declarer play (Tignum2) good enough to beat the current top 
commercial computer player. 

There have also been several attempts to apply this kind of technique to 
Go. Early Go Planners due to Sander and Davies [2E| and Lehner PI suffered 
from the fact that they only work in very open positions and provide high level 
vague plans. These early systems have difficulties in complex tactical situations, 
to a large extent due to the difficulty of modeling the interactions between high 
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level goals which must be done for effective HTN planning. One of the crucial 
differences in the framework presented here is the use of linearisations to produce 
a total order planner and the use of a world model to track these interactions. 

2 An Adversarial Planning Architecture 

This section describes an Adversarial Planning Architecture which models goal 
driven reasoning for adversarial domains^ The goal driven approach and use 
of abstract plans is motivated by work on Hierarchical Task Network (HTN) 
planning. HTN systems were first used in Noah PH and Interplan pa and 
have since been extensively studied in the AI planning field. Erol et. al. (0) 
give a complete definition for an HTN scheme and presents UCMP which is a 
provably sound and complete HTN planner and provides a good template for 
this type of system. 

2.1 Principles of HTN Planning 

HTN planning is based on three types of object: Goals , Operators and Plan 
Schemas. Operators are actions which can be performed in the world (such as 
flicking a switch, taking a step). Goals are more abstract and express aims in 
the world such as “Go to the Moon”, “Become Prime Minister”. Schemas (also 
called Task Networks), specify the subgoals which must be achieved in order to 
satisfy the goal. For example, the following schema expresses the fact that G can 
be achieved by satisfying the conjunction of subgoals G\, G 2 and G 3 : 

G : - > Gi + G 2 + G 3 

The Gi should be at a lower level of abstraction than G, and can generally 
be satisfied in any order. Operators are at the lowest level of abstraction. 

Given these three types, HTN planning starts by taking a starting world 
state and a set of goals which form the initial abstract plan, which is then refined 
step by step by expanding the goals within it. Goals are expanded by selecting a 
schema with the chosen goal as the antecedent (the G) and replacing the instance 
of G in the current plan by the subgoals (the Gi) listed in the consequent of the 
schema. As the planning process continues, interactions, incompatibilities and 
conflicts may arise between combinations of goals. These “interactions” in the 
plan must be resolved, which can result in backtracking and (in partial order 
planners) ordering constraints between goals. 

The process is complete when all goals have been expanded into sets of oper- 
ators and all arising interactions have been resolved. The sequence of operators 
thereby generated should, upon execution in the initial world state, lead to the 
achievement of the initial goals in the world. 

The extension of this idea into adversarial domains is non-trivial since plans 
are no longer sequences of actions but trees of contingencies which take into 



2 More details can be found in ESI- 
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account the actions of opponents. The interactions in the plan are considerably 
more complex and serious since the goals of opponents in the world are often 
conflicting and since the agents are non cooperative. HTN planning for adversar- 
ial domains is computationally considerably more complex than HTN planning 
in standard domains. 



2.2 Planning Framework 

The adversarial planner presented here models two agents (named Alpha and 
Beta) which represent two players ( adversaries ) in a game (the framework can 
be generalised to more than two players). Each agent keeps an open agenda 
of goals which represents its current plan of action. To solve a problem in a 
domain, each agent is given an abstract goal (or set of goals) to achieve. The 
agents then attempt to find sequences of moves which satisfy their goals. Since 
the goals of the agents are usually contradictory and the agents must take turns 
in performing actions, their interaction in trying to satisfy their goals can be 
used to find a plan for the situation 




Fig. 1. Planning steps alternating between two agents. 



The system allows the two agents to take control of the reasoning apparatus 
in turns. Once an agent has control it expands some of its abstract goals until 
it is able to decide upon a concrete action. The chosen action is then performed 
in a world mode0 before control is passed to the other agent. Figure d shows 

3 See below for how this helps choose moves. 

4 A world model is not a standard feature of HTN planners — see jjZHD and for 
more explanation of its use. 
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the flow of control during the reasoning process. An agent may need to expand 
several abstract goals before being able to decide upon an action in the world. 
During this “active” period it uses its own agenda of goals and has control of 
the shared reasoning apparatus. Once an action is chosen, control passes to the 
other agent. Agent Alpha models the player who is next to move in the game and 
agent Beta the opponent, thus the planner is trying to plan for Alpha’s move 
(Alpha takes control first). 

At any one time an agent has a plan which consists of actions already taken 
(square boxes in figure Q) and goals at various levels of abstraction (circles in 
figure 01. The actions (squares) are represented in the world model, the abstract 
goals (circles) are held in the agenda. 

A planning step involves selecting an abstract goal (such as X in figure 0 and 
expanding it. To do this a plan schema is selected for X which expresses how X 
could be achieved using a conjunction of subgoals at a lower level of abstraction. 
In figure 0 X is replaced in the plan by the two subgoals XI and X2. Once 
expansion has reached the lowest level of abstract goals these lowest level goals 
need to be shown to be already true or replaced by actions which make them 
true. 




Fig. 2. Plan refinement: abstract goals are expanded to and replaced by sets of 
subgoals at lower levels of abstraction. 



Once one of the agents (Alpha say) has achieved all of its goals (been able 
to perform actions in the world model which make them true) it knows that it 
must have satisfied its top level goals (since all its subgoals are strictly descended 
from these). The other agent is made aware of this and, since in general a good 
outcome for one agent is a bad outcome for the other, both agents are allowed 
to force backtracking. Agents are allowed to backtrack to any of their previous 
goal or expansion choices but only to their own decisions. Neither agent may 
force the other to change plans directly. 

The backtracking activity explores the various interacting plans Alpha and 
Beta have for the situation and creates a plan tree as shown on the left of 
figure 0 Each choice made by an agent creates a new branch. Underlying the 



An Adversarial Planning Approach to Go 



99 




Fig. 3. The plan tree on the left is reduced to the contingency tree on the right 
by dropping the abstract reasoning nodes. 



plan tree is the contingency tree which is found by removing all the abstract goal 
decomposition steps in the plan tree to leave only the operators/actions (shown 
on the right in figure 0 . The final contingency tree acts as a form of proof that 
the first move is a good step towards achieving Alpha’s goals. Hence it supports 
the choice of the first move in the tree|f| In general, the final contingency tree 
contains only a small subset of the move tree which would be generated by 
considering all the available options at each turn as in a data driven approach. 
(See below.) 

2.3 Discussion 

Moves in the contingency tree are directly descended from the goals of the two 
agents, and the tree structure naturally reflects the interactions between the two 
adversaries. Taking any branch, the moves chosen near the leaf (at the end of a 
move sequence) are still directly related to the same plan and aim that suggested 
those moves near the root (the first moves in the tree). 

A key difference from standard HTN planning is that goals are expanded in 
time order using a linearisation. Such a linearisation turns the planner from a 
partial order into a total order planner and allows the use of a world model. Fig- 
ured illustrates how the least abstract goals/actions appear first in the sequence. 
As soon as goals reach the lowest level of abstraction (become operators) they 
can be added to the world model. This is a significant difference from standard 
HTN planners, its importance is discussed further in ESI and m 

5 The tree itself can also be used to respond to any of the opponents moves which 
are represented in it, but re-planning may be required if other moves are made. 
The question of how long (for how many moves) such a plan remains valid is not 
addressed here. 
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3 An Adversarial Go Planner 

The planning architecture was instantiated as a Go reasoning system called 
Gobi both to test the architecture in a challenging domain and to investigate 
the usefulness of the goal driven approach for Go. Gobi consists of a set of 
knowledge modules which plug into the planning architecture. The knowledge 
modules provide the Go domain knowledge, plan schemas and goal types which 
the reasoner can use to solve problems. 




Fig. 4. Example 1: Black to play and kill the white group. 



3.1 An Example Go Plan 

Figure0shows a problem from Volume I of “Graded Go Problems for Beginners” 
(Yoshinori STTj l. The aim is for black to move first and kill the white group of 
stones. The task is specified to Gobi as two abstract goals: the goal kill-group 
for agent Alpha (playing black) and the goal save-group for agent Beta (playing 
white) 0 Agent Alpha takes control, first decomposing the kill-group goal using 
one of the available plan schemas. An abstract plan for killing this group might 
be a conjunction of the following subgoals 

— surround- group — stop the group from running and connecting. 

— squeeze-space - reduce the space the group has to live. 

— prevent-eye- formation - block any attempt by the group to make eyesll 

One of these subgoals is then expanded further to the next level and so on 
until at the lowest level in the hierarchy a move such as play at B is chosen to 
satisfy a simple goal such as prevent- escape- at- 2 (figure 0). 

6 Note: the goals need not be directly opposing. 

7 This abstract plan is quite intuitive. It is not obvious how a data driven system 
would represent the equivalent of such a plan. 

8 An eye in Go is an enclosed space where the opponent may not play - a group with 
two eyes is unconditionally alive. 
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Alpha plays this move onto the board in the world model which gives the 
new world state for Beta to work with. Alpha still has a set of goals at various 
levels of abstraction remaining in its agenda. These remaining goals represent 
the plan on how to follow the first move, i.e which other subgoals/ actions need 
to be achieved to make the plan complete. To validate that this first move is 
good (in this case playing at B would not be), Alpha must eventually show that 
all these goals can be achieved no matter what Beta does. These remaining goals 
are kept by Alpha until after Beta’s turn. 

Beta now begins by expanding its single goal save-group in the context of the 
new board position (after Alpha playing at B in Figure 0). A possible plan for 
this goal is: 

— make-eye-space. 

— make-eyes (try to form two eyes). 

After Beta’s move (whatever it is) is played into the world model, control is 
returned to Alpha which then tries to satisfy the rest of its goals. The interleaving 
of goal expansions by the two agents continues until one is able to satisfy all of 
its aims (and thus implicitly its main aim). The opposing agent is informed of 
this and it then backtracks to explore any alternative options it has which might 
produce a better outcome for itself. In this way a contingency tree is generated 
which either proves or refutes the validity of the first move (Alpha’s). 




Fig. 5. Gobi plays at X and this kills the group. 



For this example (figure Gobi returns the move at X in figure Q which 
kills the group. Among the defences tried by Beta are trying to run out at 2 
and counter-attacking by playing at 1 (this puts the single black stone C under 
threat). Since all the moves tried by both agents must be part of plan of action , 
the number of possible moves searched is very small compared to the number of 
available moves in even this small problem. 

3.2 Representing Go Knowledge 

The planning architecture and Go knowledge modules which make up Gobi are all 
written in Common Lisp. Around 1400 lines of code make up the plan knowledge 
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(i.e. schemas and goal types) Gobi has. Writing a full-board Go-playing program 
is a significant exercise in knowledge engineering, so to enable us to add enough 
depth in knowledge to do useful testing in a short time, Gobi’s knowledge is 
focused on the area of killing and saving groups^ The knowledge base is made up 
of 45 goals at five different levels of abstraction. The average number of possible 
plans per goal is approximately two (thus the knowledge has a relatively low 
branching factor). The two highest level goals available in Gobi are kill-group 
and save-group , which emphasises the focus on life and death problems. 

The following example goal taken from Gobi’s knowledge base illustrates the 
structure of the knowledge the program holds (the save-group goal was also men- 
tioned in the previous example). Note that the plan knowledge is not complete 
(for example making eyes is not the only way of following up a counter attack) ; 
more work is needed to extend the knowledge base. 

GOAL: save-group, 

LEVEL = 5, 

Plan 1 - Find Eyes: 

*make-eye-space , 

*make-eyes . 

Plan 2 - Escape Group: 

♦running-sequence , 

♦secure-escape . 

Plan 3 - Counter Attack String: 

♦locate-vulnerable-str ing , 

♦kill-string, 

♦make-eyes . 

In turn the make- eye- space goal from Plant has two alternative plans: 

GOAL: make-eye-space, 

LEVEL = 4, 

Plan 1 - Ambitious Extend: 

♦large-extending move, 

♦consolidate-space . 

Plan 2 - Creep extension: 

♦creeping-extending-move, //A single 
♦consolidate-space. //step extension. 

The structure of the knowledge shown here is intuitive for Go and is very different 
from the kind of heuristic information added into data driven approaches. The 
knowledge represented can be seen as an AND-OR tree with the AND component 



9 Note however that this does not mean Gobi is limited to enclosed problems (see H4.2I) . 
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represented in the conjunction within the plan schema and the OR component 
in the choice between alternative plans. Goals within plans are not evaluated 
at every step, for simple goals truth is established using a simple test or if an 
appropriate move can be made. For higher level goals truth is inferred from the 
fact that the goals descended from them were achieved. 

3.3 The Planning Process 

Currently Gobi has no goals and plans which persist between game moves and 
top level goals are given to the planner for each position to solve. In order to 
extend Gobi to whole-board play, one of the most important tasks is to define 
goals and abstract plans above the current highest level. The planner may keep 
these goals open (and still unexpanded) from one turn to another and indeed for 
long periods of the game. Earlier work mm has already shown the usefulness 
of planning at the higher strategic level, Gobi aims to be useful and applicable 
at all levels. 

As described in (Q during the reasoning process a planning step involves two 
steps: 1) Choosing a current goal to replace in the plan and 2) Choosing an 
appropriate plan to replace it with. In Gobi the choice of which goal to expand 
next is unordered, however after choosing a goal from the agenda Gobi will try 
to work on it down through several levels of expansion. Repeatedly choosing 
one of it’s descendents for expansion, this leads to some parts of the plan being 
expanded quickly (so they can be tracked in the world model) while others 
remain abstract for longer. Once a goal has been chosen, one of its associated 
set of plans must be selected for use. The plan schemas in Gobi are tried in a 
fixed order designed to try the most promising goal decompositions first. Plan 
preconditions can also be used to screen out plans not thought to be suitable for 
the current situation. If one plan leads to a dead-end, backtracking allows Gobi 
to try the others available for the situation. 

The plans stored in the Go modules are not pre-programmed solutions and 
are expanded in the context of the current game state. The development of a 
plan is influenced by schema preconditions and by the choices in the world for 
making the lower level goals true. The failure of goals early in the plan quickly 
forces choice of alternative subplans for making these goals true. 

The expansion of goals into plans eventually leads to goals at the lowest level 
of abstraction. These need to be checked for satisfaction and used to choose 
moves. Some example lowest level goals are: fill-a-libert.y, play-a-hane-move (near 
here), play- a- connecting-move (between string 1 and string2 ), play-a-placement- 
move, play-a-blocking-move (near here), play- an- extending-move (near here). 
The cost of move choice can be divided into three components: 

1. Abstract Planning steps: these are very cheap since they consist purely of 
two cycles of matching items in a list (choose a goal, choose one of its plans) . 
They can be more expensive if complex pre-conditions are used to select be- 
tween plans. 
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2. Checking the satisfaction of low level goals: Since the goals are very focused 
(are these two strings connected? could black run out here?) this is also not 
too expensive. Checking is made easier by the fact that it can be supported 
by various representations of the game state — strings, groups, influence etc. 

3. Using low level goals to generate moves: This is the most expensive part 
of the process although again the goals by this stage are very focused and 
limited to a small area (the here above). In Gobi selection is done using 
simple rules which define the type of move which could satisfy a goal0 An 
example set of rules is that a connecting move must: 

(a) be a liberty of (next to) stringl and 

(b) be a liberty of (next to) string2. 

Although the workload is divided between these three parts it is clear that 
this move choice method is in general more expensive for choosing a single more 
to try in the move tree than most data driven methods (see fth.’J for some es- 
timates). The potential gain of using this method is in reducing the number 
of candidate moves in the move tree which needs to be tried. There are two 
important tradeoffs in the planning process: 

— If goal selection and expansion mechanisms get more complex (i.e. through 
the extensive use of complex preconditions) their time cost will increase, 
however it should also mean a better use of the knowledge available. 

— The simpler and more specific the lowest level goals are, the easier it is to 
establish their truth and choose actions with them but the more planning is 
needed to decide upon them. 

3.4 Mixing Goal and Data Driven Approaches 

Although the work here focuses on goal driven approaches to Go, it is clear that 
human players mix both types of reasoning. Patterns are thought to play a large 
part in human Go play. Thus, there is a need to consider how the two approaches 
can be mixed successfully. 

One of the advantages of using a world model in the planner is that the 
changing situation of the state during planning is reflected in the world model. 
The changing world state may highlight interesting opportunities (or obvious 
problems) which arise as a result of the plan actions but were not expected 
effects. The architecture described above was extended to include plan critics 
which have access to the world model and watch for important situations. The 
critics are able to insert goals into the agendas of the agents for the planner 
to consider alongside the current track of reasoning. Thus the planner is made 
aware of opportunities, for example, during planning and can react to them. 

10 Note this could just as well have been done with local pattern matching. The 
planning framework poses no restriction on how this relationship between abstract 
goals and concrete actions is established (in general this is domain dependent). 
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Such opportunistic reasoning seems a plausible mechanism in terms of human 
play. Plans suggest moves which might lead to unexpected opportunities which 
in turn might lead to new plans etc. 

Two critics were added to Gobi which detect groups under immediate threat 
of capture and insert (optional) group-saving/group-attacking goals into the 
agents’ agendas. 

4 Testing and Evaluating GoBI 

Yoshinori’s four-volume series provides an excellent source of problems for 
testing Go programs. Since setting up each test problem was time-consuming, 
we chose (test set I) a representative sample (85 problems, approximately one 
third) of the problems from volume I of |Zj . Problems were chosen for the test 
set essentially randomly, with a bias towards harder problems and excluding the 
simple capturing problems at the beginning of the book (which were considered 
too easy). A second set of tests using problems from volume II m and 0 was 
also conducted. All the tests were limited to 30 seconds of runtime on a SUN 
UltraSparc. 

4.1 Test Results 

The system successfully solved 7490 of the examples in test set I, which is a 
significant achievement given its limited knowledge and the complexity of some 
of the problems. Gobi has also solved considerably harder problems from volume 
II [ZD] and [Zj. These tests were not comprehensive however, and used a range of 
hand picked examples, so the most useful indicator of performance remains the 
performance on problems from volume i m- 

In 98% of the correct solutions, Gobi considered most of the significant de- 
fences or responses of the opponent in its plan. This statistic is encouraging since 
not only were the answers correct — so was the reasoning behind them. Most 
of the failures were due to incomplete plan knowledge, several problems relied 
on groups connecting and escaping to live for example which is something Gobi 
currently has no plans for (Gobi has no plans for cutting groups) . Another weak 
area was in problems which required a lot of forcing moves (e.g. ladders). Gobi 
has no special way of handling these and so is not able to take advantage of the 
problem simplification they provide (they are planned for in the normal way). 

Curiously strengthening Gobi’s defensive knowledge led to an improvement 
in attacking plans and vice versa, reflecting the fact that the better opponent 
model is more likely to find refutations for poor attacking plans. 

4.2 Discussion 

Further useful tests would involve comparing solutions with those of current 
programs such as Gotools and Many Faces of Go, however we currently 



11 67% when critics were disabled. 
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have access to neither of these. A second difficulty is that in fact comparisons 
would need to be made as to the number and relative cost of steps taken (search 
or reasoning) since our current system is a prototype and the other two systems 
are commercial with several years of development. 




Fig. 6. Example 3: Gobi solves this more open position, killing the white string 
marked A by playing a net move at 1. Adding white stones at X or Y for example 
(or various other places) would cause Gobi to realise that the string can no longer 
be killed directly. 



Even though most of Gobi’s knowledge is about killing and saving groups 
it is more general than specialist life-and-death programs, knowledge for other 
parts of the game can be freely added and Gobi is also able to handle more open 
positions (which for example Gotools has trouble with 2Bj such as the one 
shown in figure 0 • In fact one would expect Gobi to have more of an advantage 
in open (though still tactical) positions where the number of possible move se- 
quences increases rapidly. The plan knowledge in the system would then focus 
on only the relevant parts of the move tree. 



5 Relationship to Other HTN Systems and Adversarial 
Planners in Go 

Since what was probably the first use of the notion of goal directed search by 
Reitman and Wilcox m in 1979 there have been several attempts at applying 
planning approaches to Go. 

Work by Ricaud m has similarities to that presented here although it focuses 
more on the use of abstraction. The Gobelin system uses an abstract represen- 
tation of the Go game state to produce plans which are then validated in the 
ground representation. Gobi is more flexible than this in that it shifts between 
several levels of abstraction (five with current knowledge) dynamically during 
planning rather than having a generation/ validation cycle. Gobelin probably 
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also suffers in a similar way to m and H2 in that it has difficulty in repre- 
senting complex interactions in the abstract representation. This difficulty may 
explain why Gobelin makes the strongest plans during the opening where play 
is very open and there are few complex tactical situations. In his PhD thesis 
Hu mi again focuses mainly on the strategic level of Go play, addressing is- 
sues of combining individual strategic and tactical goals into goals with multiple 
purposes. 

The use of a world model to track the interactions between goals and making 
linearisation commitments during planning seems essential to any progress in 
domains with high levels of interactions between player goals. The utility of a 
world model has also been noted in other domains — notably Bridge |221 El! 
— and makes Gobi much more applicable to tight tactical situations which can 
arise in Go (which was the major stumbling block in previous work ,T2|). 
Gobi also uses an intuitive planning structure with the planning engine clearly 
separated from the knowledge. 

The architecture probably owes most to the Wilkins’ Paradise Chess system 
m in the way that it uses plans to guide move choice. The use of the world 
model allows a similar access to the traversal of the search tree used in Paradise. 



6 Advantages of the Goal Driven Approach 

The goal driven approach which is presented here has some clear advantages for 
Go and other similar games. Together with some of the previous work on the 
usefulness of planning at the strategic level of Go, Gobi shows that this approach 
can be used for reasoning at all levels, moving transparently between them and 
providing a unifying framework for Go play. 



6.1 Representation and Communication of Domain Knowledge 

Go knowledge in books and folklore is often expressed in a form appropriate for 
encoding as decompositions of abstract goals into other goals at different levels 
of abstraction. As reported in ESI, there is a rich vocabulary which Go players 
can use to express their reasons for making a move. There are many popular 
proverbs, which convey strategies at various levels of abstraction, for example 
“death lies in the hane” is more tactical, whereas “don’t push along the fifth 
line” is quite strategic. It may be easier to add this kind of knowledge to a goal 
driven system than to a data driven system which requires the delicate balancing 
of heuristics. 

By following the trace of the goal decompositions one can see why the Go 
player is trying to do something — its aims and plans. Access to aims and plans 
is not only helpful for adding knowledge and debugging, but could be useful in 
the context of a Go tutoring aid. Some of the current commercial Go systems 
(Many Faces of Go for example) have teaching mechanisms but it is not clear 
whether these are based directly on the reasoning process of the computer player. 
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6.2 Search Properties 

Using a goal driven approach leads to a very different search from a data driven 
approach, some key advantages are: 

— There is no longer a need for global evaluation functions: this reduces to 
checking if goals can be satisfied. Evaluation functions may still be used in 
a limited way to carry out judgements which cannot be made on a move-by- 
move level, for example using an influence function to judge whether or not 
a group can successfully run, or using a fast a — (3 search to determine the 
life or death of enclosed groups. When an evaluation function is employed in 
determining whether or not a goal has been satisfied, the goal can be used 
to provide a focus for the evaluation function to a limited area of the board. 
The use of goals to focus evaluation makes sense when thinking about how 
humans play go — often making instinctive judgements (evaluations) of a 
local situation but rarely considering (evaluating) the whole board. 

— Quiescence is defined automatically, thus avoiding problems such as the hori- 
zon effect in search. Since each agent has an agenda of goals, a situation can 
be defined as “open” (unsettled) until all goals are achieved. 

— What seem like heuristically bad moves (e.g. sacrifices) are not discriminated 
against because the only consideration is their value to the plan. 

Whilst these are important advantages a further strong point often associated 
with goal driven approaches is their resilience in domains with large search spaces 
and branching factors. Move choices are driven from above and not by the options 
available at each choice point. The plan knowledge therefore focuses which of the 
choices are directly relevant to the current plan. The problems Gobi was tested 
on gave a reduction in the number of move tree nodes visited of 10 - 100 times 
compared to standard a — f3 search. The current top Go programs are able to 
use a great deal of pattern and heuristic knowledge to improve on a — /?, however 
these gains are still significant. 

The number of nodes visited in the move tree does not tell the whole story 
since Gobi requires more reasoning to choose which nodes to visit. This reasoning 
consists of the three components identified in 4:1:11 (expansion of a goal to a 
further abstract plan, checking if a low level goal is satisfied, choosing a move 
given a low level goal). For the examples in test set I (see (0) there were on 
average 4 planning steps per move chosen (including the steps which choose 
moves). Initial analysis suggests that given Gobi’s current knowledge, each move 
tried in the game tree involves an average overhead of 2.5/r in reasoning, where 
[x is the average cost of modifying the internal data structures to make a move 
(including checking for Ko’s, removing any stones killed, checking legality etc). 
Thus the cost of choosing a move in Gobi is about 3.5 times as much as in a— /3 
(since a—/?’ s main cost is adding stones to the board - i.e. p). The computational 
advantages of Gobi over a— (3 are still clear-cut since the number of move choices 
(tried in the move tree) is significantly less than for a — /?. Note also that any 
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system which begins to use heuristics for move choice also begins to increase the 
cost of each individual move choice. A more direct comparison with the systems 
able to apply large amounts of heuristic information to improve upon a— (3 would 
be useful but we do not have access to the relevant systems. The figures above 
at least illustrate how the plan knowledge in Gobi is able to cut down the search 
space by reasoning from above. 



7 Disadvantages of Goal Driven Search 

Obviously the goal driven approach is not always the best choice and has its 
own difficulties: 

— The goal driven approach requires significant effort to encode strategies as 
goal decompositions. By contrast, in the data driven approach, good play 
can be achieved using even relatively simple evaluation functions as long as 
the game tree can be searched deeply enough. 

— There are some types of knowledge which are hard to express in a goal/plan 
oriented framework, such as knowledge which is not reliant on understand- 
ing the motivation behind a move (patterns for example) . It seems clear that 
good Go play requires both pattern (data driven) and abstract plan (goal 
driven) knowledge which is what leads us to try and integrate the two ap- 
proaches (see Etl.41 . 

— For games with low branching factors, shallow search trees or where near 
exhaustive search is possible data driven approaches have a strong advan- 
tage. It is only when searching most of the move tree is infeasible and large 
amounts of knowledge is needed to prune the tree that goal driven approaches 
can gain the upper handful 

As with all knowledge based approaches, the search space is determined by 
the knowledge in the system, so large amounts of knowledge can dramatically 
impact performance. The key to making knowledge based approaches work is 
good organisation of that knowledge. Having many overlapping plans for exam- 
ple would cause serious performance problems, since each would be evaluated 
in turnO In general there is little problem in adding “breadth” of knowledge 
(e.g. plans for different parts of the game) since knowledge is often obviously 
not applicable or leads to failure quickly. Adding more detailed plans (“depth” 

12 This point was illustrated by Paradise m in the early eighties which despite being 
able to solve Chess problems requiring search up to 20 ply deep (far beyond other 
Chess programs of the time), still saw its knowledge based approach outstripped by 
the ever increasing efficiency of fast search techniques. 

13 Some moves may be chosen twice for different reasons (in different plans) which is 
perfectly acceptable, however if many plans overlap long sequences of move choices 
could be the same. 
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in knowledge) needs to be done carefully however since this can lead to overlaps 
and redundant search. The levels of abstraction used in Gobi (and in many AI 
planners) are key in avoiding this since knowledge can be appropriately struc- 
tured and similar plans can be grouped together and expressed in a compact 
form. 

The main problem with Gobi itself is still lack of knowledge: much more is 
needed before Gobi could be used as a tool to play the complete game of Go. 
Unfortunately adding knowledge takes a long time since it must be hand coded, 
most of the top programs have had knowledge added over periods of years. In 
this respect this approach (and many current approaches) are at a disadvantage 
to learning or non-symbolic approaches which can use automatic training to 
improve. 

8 Conclusions 

In this paper we have presented an adversarial planning architecture capable 
of reasoning about games, and an application of this architecture to Go. The 
planning architecture and Go reasoner reported here represent an advance on 
previous work for goal driven planning in Go. The system 

— has a clear separation of domain knowledge from the abstract planning ar- 
chitecture and a clear model of the opponent in the game, 

— can reason at multiple levels of abstraction simultaneously, 

— can address complex tactical situations as well as high level strategic prob- 
lems, 

— can provide support for the integration of data driven and goal driven ap- 
proaches. 

We presented the advantages that a goal driven approach could have for 
Go and discussed the importance of mixing data and goal driven aspects when 
working in complex domains such as Go. Gobi as a prototype is certainly no 
match for current systems which play the full game of Go (and indeed cannot 
yet play the full game), however it does represent a step towards understanding 
how goal driven approaches could be applied to Go, even at a tactical level. 
Gobi needs more knowledge adding to be applicable in other areas of the game. A 
further important step is to incorporate higher level, persistent goals which allow 
Gobi to play the whole game of Go. We also aim to further develop structures 
for mixing data and goal driven approaches. 

Go has several strong domain features which make goal driven approaches 
applicable (very large search spaces, clear layers of abstraction in domain de- 
scriptions — stones, strings, groups etc, and a wealth of knowledge similar in 
structure to abstract plans) Gobi represents a further step towards turning this 
theoretical possibility into a reality. The work described in this paper again shows 
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that Go is an excellent test bed for AI research. There has been very little work 
on adversarial planning in recent years — the challenge of Go really motivated 
this work. 
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Abstract. This paper describes first results front the application of Temporal 
Difference learning [1] to shogi. We report on experiments to determine 
whether sensible values for shogi pieces can be obtained in the same manner as 
for western chess pieces [2], The learning is obtained entirely from randomised 
self-play, without access to any form of expert knowledge. The piece values 
are used in a simple search program that chooses shogi moves from a shallow 
lookahead, using pieces values to evaluate the leaves, with a random tie-break 
at the top level. Temporal difference learning is used to adjust the piece values 
over the course of a series of games. The method is successful in learning 
values that perform well in matches against hand-crafted values. 

Keywords: Learning, Shogi, Temporal Difference, Minimax, Search, Game- 
playing. 



1 Introduction 

This paper describes results from the application of Temporal Difference learning [1] 
to learning the relative values of shogi pieces. The learning is obtained entirely from 
randomised self-play, without access to any form of expert knowledge. The piece 
values are used to evaluate the leaves of a minimax search tree, and temporal 
difference learning is used to adjust these values over the course of a series of games. 

A combination of machine-learning methods, including TD learning, have earlier 
been used to learn chess piece values for use by a program performing a 1-ply search 
only [3], and also for coarse-grained piece values [4], TD learning has also been used 
successfully [5] by Baxter, Tridgell and Weaver to improve the weights of a complex 
chess evaluation function consisting of positional terms as well as piece values, when 
playing against knowledgeable opponents. However, they found it necessary to 
provide piece weights as initial knowledge to obtain good performance. 

The focus of this work is on learning from self-play alone, with no knowledge 
input, as this is of greater potential value for problems where existing expertise is not 
available, or where the computer program may be able to go beyond the level of 
existing knowledge. 



H.J. van den Herik, H. Iida (Eds.): CG'98, LNCS 1558, pp. 113-125. 1999. 
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Previous experiments in learning from self-play with no initial knowledge [2] have 
found the Temporal Difference (TD) learning method alone to be highly successful in 
learning chess piece values that performed well in a computer program. Shogi is 
significantly different from chess, as detailed in section 2 below, and there are no 
standardised piece values to compare against. Although human shogi players avoid 
ascribing fixed values to shogi pieces, they do generally agree that rooks and bishops 
are the most powerful pieces, and that pawns have least value. 

Our investigation was designed to discover whether the same TD technique would 
perform as satisfactorily in shogi as it had in chess, and whether it would yield 
sensible values for shogi pieces. 

We performed experiments to learn suitable values for thirteen adjustable weights 
(seven main and six promoted shogi piece types). The experiments were conducted 
with a variety of learning and search parameters, and random seeds, and the 
consistency of results was surveyed. To demonstrate that the values learnt are 
reasonable, we played matches between four versions of the program, one using piece 
values learnt during the experiments, two using values used obtained from other shogi 
programs and a fourth using values estimated by a human programmer. 



2 Shogi 

Shogi is the Japanese name for the Japanese version of chess. It belongs to the same 
family of games as western chess and Chinese-chess (Xiangqi). Throughout this 
paper we refer to western chess as just chess, and Japanese chess as shogi. An 
introduction to the rules and some basic strategies of shogi is given by Fairbairn [6] 
and Leggett [7]. Matsubara, Iida & Grimbergen [8] suggest shogi is an appropriate 
target for current game-playing research, and discuss the similarities and differences 
between shogi and chess. In shogi, captured pieces are not eliminated from the game, 
but kept in hand by the capturing player, and may later be returned ( dropped) on 
almost any vacant square. This greatly increases the branching factor of the game 
tree, and makes the game less amenable to full-width searching techniques. An 
additional feature of shogi is that all pieces apart from the king and gold are eligible 
for promotion once they reach the promotion zone (the last three ranks of the board). 
Pawns, Lances, Knights and Silvers may all promote to Golds upon entering the 
promotion zone, whereas Rooks and Bishops promote to more powerful pieces. 

Choosing suitable values for shogi pieces is a problem for game programmers, as 
shogi experts prefer not to allocate values to the pieces. Sensible values for chess 
pieces are fairly widely known, but there is no generally-agreed standardized set of 
values for shogi pieces. Hence shogi programmers have more need for machine 
learning to generate material values for use in evaluation functions. 
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3 Temporal Difference Learning 

Temporal difference (TD) learning methods apply to multi-step prediction problems. 
Each prediction is a single number, derived from a formula using adjustable weights, 
for which the derivatives with respect to changes in weights are computable. Each 
pair of temporally successive predictions gives rise to a recommendation for weight 
changes. Sutton [1] shows that TD methods make more efficient use of their 
experience than conventional prediction-learning methods, converging faster and 
producing more accurate predictions. 

Weight adjustments are made according to: 

»’,= (p„, p,)' " ,A (i) 

k-\ 

where P is a series of temporally successive predictions, w the set of adjustable 
weights, is a parameter controlling the learning rate, and w P t is the vector of 
partial derivatives of P t with respect of w. The recency parameter, , allows for an 
exponential weighting with recency of predictions occurring k steps in the past. 
TD( ) learning enabled Tesauro’s backgammon program to reach master level [9], 
[ 10 ]. 

The process may be applied to any initial set of weights. Learning performance 
depends on and , which have to be chosen appropriately for the domain. In 
principle, TD( ) weight adjustments may be made after each move, or at any 
arbitrary interval, but for game-playing tasks, the end of every game is a convenient 
point to actually alter the evaluation weights. 



3.1 Obtaining Prediction Probabilities from Evaluation Scores 



The evaluation score from the position at the end of the principal variation (the 
principal position) is "backed up" to the root of the search, and is regarded as a 
prediction of the final outcome of the game, to be compared with future values by the 
temporal difference method. The TD method requires a probability of winning, 
rather than a material score, so the values returned by the search are converted by use 
of a standard sigmoid squashing function. Thus the prediction of probability P, of 
winning from a given position is determined by: 



P = 



1 

l + e ” 



( 2 ) 



where v = the ‘evaluation value’ of the position. 

This sigmoid function has the advantage that it has a simple derivative: 



dP 

dv 



p ( i p) 



(3) 
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The derivative appears in the classical supervised-learning procedure on which 
TD( ) is based - the adjustments are proportional to the derivative so that weights 
which have little effect on the prediction are adjusted less than weights to which the 
prediction is more sensitive. 




Fig. 1 . Conversion of position value into prediction probability 

The figure shows the conversion of position value into prediction probability. The 
example score of +1 Rook (using the learnt value of Rook = 1.21 from Run3) is 
converted into a probability of winning of 0.77. 

Note that in Shogi, unlike chess, the value of a piece is not the same as the change 
in the material balance when a piece is captured. For example, when capturing an 
opponent’s promoted rook the change in material balance needs to take into account 
both the loss of the promoted rook to the opponent, and also the gaining of a rook in 
hand for the capturing side. The +1 rook score shown in figure 1 represents the value 
of a rook, and does not represent the effect of a rook capture, which would change the 
material balance by twice that value. 



4 The Shogi-Playing Search Engine 

The experiments used a search engine derived from a conventional chess program, 
with an iteratively-deepened search, alpha-beta pruning and a captures and 
promotions only quiescence search at the horizon [11], To prevent undue search 
effort being expended in the quiescence search, it was limited to six plies. Null-move 
pruning [12], [13] was used to reduce the size of the search tree, and the search was 
made more efficient by the use of a transposition table. The evaluation function 
applied at the leaves of the quiescence search consisted of the material score only, 
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with the move choice at the root being made randomly from the materially-equal 
moves. 

The thirteen piece values being learnt were used by the evaluation function. The 
material balance for a position was calculated as the sum of all the values of the side 
to move’s pieces (including pieces in hand), minus the sum of all the values of the 
opponent’s pieces. 



5 The Experiments 



We performed many learning runs to explore the behaviour of the TD method using a 
variety of learning rates and values for . All games were played using the shogi 
engine described in section 4, with a three-ply-deep main search. To prevent the 
same games from being repeated, the move lists were randomised. A random choice 
is thus made from all tactically equal moves, and it has the added benefit of ensuring 
a wide range of different types of position are encountered. 

During each game a record is kept of the value returned by the search after each 
move, and the corresponding principal position. These values are converted into 
prediction probabilities by the squashing function, and then equation (1) is used to 
determine adjustments to the weights at the end of each game. 

In the experiments reported here we used a value for of 0.95, and a variable 
value of that decreased during each learning run, from 0.05 to 0.002. At the start of 
each run we initialised all weights to 1, so that no game-specific knowledge was 
being provided via the initial weights. 

The experiments learn values for the pieces entirely from randomised self-play. 
This method has the advantage that it requires no play against well informed 
opponents, nor are games played by experts supplied. The piece weights are learnt 
“from scratch”, and do not need to initialised to sensible values. The only shogi- 
specific knowledge provided is the rules of the game. Whilst each learning run 
consists of several thousand games, this represents a relatively short amount of 
machine time, and the entire run can be completed without any external interaction. 



6 Results 



We present results from five separate learning runs of 6000 games each. The learning 
runs were identical except that a different random number seed was used in each one, 
ensuring that completely different games were played in each. We shall refer to these 
learning runs as Runl through Run5. 
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6.1 Weight Traces 

Figure 2 shows the weight traces for un-promoted pieces for a typical learning run 
(Run3) of 6000 games. A decaying learning rate was used for the first half of the run, 
decreasing from 0.05 to 0.002. Once the learning rate reached 0.002, it remained 
constant for the remainder of the run. Very similar results were achieved using a 
fixed learning rate of 0.002, but the runs required more games to achieve stable 
values. 



Rook 

Bishop 

Silver 
Gold 

Knight 
Lance 

Paw n 

0 2,000 4,000 Games 6,000 




Fig. 2. Typical weight traces (main pieces) 



From figure 2 we can see that the relative ordering of the main pieces has been 
decided after about 4000 games, and that pieces remain in that relative order for the 
remainder of the run. During the last 2000 games there is still considerable drift in 
the values. Some random drift is to be expected as a result of the random component 
included in the move choice. We averaged the values over the last 2000 games in 
order to obtain values for testing against other weight sets. 

Figure 3 shows the weight traces for promoted pieces from Run3. Comparing 
figures 2 and 3 we can see that the promoted piece traces appear more stable than the 
main piece traces. This is because adjustments to the promoted piece types occur less 
frequently during the course of a game. Indeed, some games may not contain a single 
instance of a given promoted piece type. There is no trace for Gold, because they do 
not promote. 
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Fig. 3. Typical weight traces (promoted pieces) 




Pawn Lance Knight Silver Gold Bishop Rook 



Fig. 4. Normalised learnt values for 5 runs (main pieces) 
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6.2 Main Piece Values 



Figure 4 shows relative values for the seven main piece types, from each of the five 
learning runs. To avoid fluctuations in the weights due to noise from the stochastic 
nature of the game-playing process, these values represent the average over the last 
2000 games in each of the five runs. 

It is the relative values of the pieces that governs move selection, not the absolute 
values. This enables us to readily compare the values from the five runs by 
normalising them so that Rook=5. (In chess, one often refers to the values of pieces 
in terms of pawns, e.g., “A knight is worth three pawns”. In shogi, there is no such 
commonly used metric. However, in certain rare situations [3] the rules of shogi state 
that Rooks are to be scored as five points each, and all other pieces as one point each. 
We chose to use the five-point rook score as our reference value for normalising.) 

From figure 4 we can see that each of the five runs has learnt the same ordering of 
the pieces (Pawn, Lance, Knight, Silver, Gold, Bishop, Rook). In addition, the 
relative magnitude of the learnt values is remarkably consistent across the five runs. 



6.3 Promoted Piece Values 



Figure 5 shows the normalised relative values for the six promoted piece types (Golds 
do not promote). The values for promoted Bishops and Rooks are substantially more 
that for their un-promoted counterparts. 

When promoted. Pawns, Lances, Knights and Silvers all promote to piece types 
that move in exactly the same way as a Gold. Despite this it can be seen from Figure 
5 that the learnt values for these promoted types differ considerably. Partly this may 
be due to low numbers of promotions in the games, which leads to higher run-to-run 
variance, and to end-of-run values which are not yet settled. 

But, in particular, the value learnt for promoted Pawns is consistently greater than 
those learnt for a promoted Lances, Knights, or Silvers. This is probably partly due 
to the fact that the act of promoting a Pawn has the additional benefit of making all 
empty squares in that file available for the subsequent dropping of a Pawn in hand. 
(Shogi prohibits dropping a Pawn into a file that already contains a friendly un- 
promoted Pawn.) 

Another issue that may affect the value of promoted pieces is the value to the 
opponent if they are captured. For example a promoted Pawn gives the opponent only 
a lowly Pawn in hand, whereas a captured promoted Silver gives the opponent a 
Silver in hand. Thus the promoted Pawn is more expendable than a promoted Silver, 
even though their capabilities on the board are the same. 
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Fig. 5. Normalised learnt values for 5 runs (promoted pieces) 



6.4 Matches to Test the Learnt Values 



To test the effectiveness of the learnt values in our domain, a number of matches were 
played between identical search engines using various different piece values. The 
search engines used were the same as used in the learning experiments, but the piece 
weights were fixed to a given set of piece values, and not adjusted during the match. 

Each match consisted of 2000 games, alternating Black and White (Sente and 
Gote). Games that ended in mate were scored as 1 point for the winning side. Games 
that were unfinished after 600 ply (300 moves each) were scored as Vi a point for both 
sides. 

We ran two set of matches. The first was effectively a mini-tournament to 
compare the average values from all five learning runs with values obtained from 
other sources. The second set compared each of the five sets of learnt values with the 
best of the values from other sources. 

The Beginner piece values were decided by a shogi beginner (but experienced 
game programmer), guided by the advice given by Leggett [7]. 
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Fig. 6. Value sets tested in match play (main pieces) 



□ Beginner ■ Gnu DYSS □ Learnt 
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Fig. 7. Values sets tested in match play (promoted pieces) 



The Gnu-derived piece values were derived from those used by the widely 
available program Gnu Shogi [14], This program uses four different sets of piece 
values, depending on the stage of the game, as determined by various heuristics. The 
Gnu-derived values are the average of these four sets. 
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The YSS piece values are those published on the WWW by Hiroshi Yamashita 
[15], author of YSS 7.0, winner of the 7 ,h World Computer Shogi Championship in 
1997. 

The evaluation functions of both Gnu Shogi and YSS also contain more 
sophisticated positional terms, e.g. king safety. In both programs, piece values are 
fundamental and typically the largest component of the overall evaluation score for a 
position. (Positional factors may also reward material possession indirectly. We 
ignored this secondary effect for these matches.) 

The Learnt piece values are the average of the values presented in figures 4 and 5. 

Figures 6 and 7 show the pieces values used in the matches, again normalised to 
Rook=5 . 



Table 1 . Match results 



Side 1 




Side 2 


Games 


Win 


Loss 


Draw 


Learnt 


VS. 


Beginner 


2,000 


1,206 


718 


76 


Learnt 


vs. 


Gnu 


2,000 


1.170 


766 


64 


Learnt 


vs. 


YSS 


2,000 


1,071 


871 


58 


YSS 


vs. 


Beginner 


2,000 


1,113 


835 


52 


YSS 


vs. 


Gnu 


2,000 


1,146 


784 


70 


Gnu 


vs. 


Beginner 


2,000 


1,018 


911 


71 



Table 2. Mini-tournament cross-table 





Learnt 


YSS 


Gnu 


Beginner 


Learnt 


X 


55% 


60% 


62% 


YSS 


45% 


X 


59% 


57% 


Gnu 


40% 


41% 


X 


53% 


Beginner 


38% 


43% 


47% 


X 



Table 1 gives the details of the matches played in the mini-tournament, and table 2 
shows the cross-table of results. The Learnt values performed better than any of the 
other value sets under our test conditions, scoring 55%, 60% and 62% against the 
YSS, Gnu-derived, and Beginner value sets respectively. 

The Learnt values were the average of the five learning runs. To verify that each 
of the individual learning runs learnt reasonable weights, each was pitted in a match 
against the YSS values, which performed the best of the three non-learnt sets. Table 
3 shows the results from these matches, and shows that each of the five learning runs 
produced values that beat the YSS values under out test conditions. 
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Table 3. Individual learning run match results against YSS values. 





Games 


Win 


Loss 


Draw 


Percent 


Runl 


2,000 


1,062 


871 


67 


55% 


Run2 


2,000 


1,044 


888 


68 


54% 


Run3 


2,000 


1,009 


915 


76 


52% 


Run4 


2,000 


1,070 


852 


78 


55% 


Run5 


2,000 


1,004 


925 


71 


52% 



7 Conclusions 

The shogi piece values were learnt from self-play without any domain-specific 
knowledge being supplied. Although shogi experts are traditionally reluctant to 
assign values to the pieces, we believe that our learnt values would be recognised by 
human experts as reasonable for use in a shogi program. 

It should be noted that these experiments have learnt material values within a 
material-only evaluation function. We would expect the material values learnt to be 
somewhat different if the evaluation function included positional scoring terms. Also, 
our results are obtained using a specific set of search parameters (depth, selectivity, 
quiescence details, etc). These may influence the optimum values, although we 
would expect changes to search parameters to have less of an effect on learnt values 
than additional evaluation terms. The method could be applied to any other set of 
search parameters, and other search engines. It is also applicable to learning an 
appropriate weight for positional evaluation terms, and we expect it to be useful in 
learning weights for more sophisticated evaluation functions in both chess and shogi. 
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Abstract. This paper discusses a practical framework for the semi- 
automatic construction of evaluation-functions for games. Based on a 
structured evaluation function representation, a procedure for exploring 
the feature space is presented that is able to discover new features in 
a computationally feasible way. Besides the theoretical aspects, related 
practical issues such as the generation of training positions, feature se- 
lection, and weight fitting in large linear systems are discussed. Finally, 
we present experimental results for Othello, which demonstrate the po- 
tential of the described approach. 

Keywords: automatic feature construction, GLEM, Othello 



1 Introduction 

Many AI systems use evaluation functions for guiding search tasks. In the con- 
text of strategy games they usually map game positions into the real numbers for 
estimating the winning chance for the player to move. Decades of research has 
shown how hard a problem evaluation function construction is, even when focus- 
ing on particular games. In order to simplify the construction task, the notion 
of evaluation features was introduced. The underlying assumption is that there 
exist reasonable approximations of the perfect evaluation function in the form 
of combinations of a few distinct numerical properties of the position — called 
features. Given this, evaluation functions can be constructed in two phases by 
1) selecting features and 2) combining them. 

Selecting features is one of the most important and difficult sub-tasks in the 
construction of a game playing program. It requires both domain specific knowl- 
edge and programming skills because of the well known tradeoff between speed 
and knowledge in game-tree search. A couple of years ago, the authors of the 
best game playing programs still picked not only features but also their weights 
in course of a tedious optimization process. This is somewhat surprising, since 
already in f7j proposed ways for automatically tuning weights. While selecting 
features is difficult for a machine, fitting even a large number of weights given 
a set of training positions is not. Research focused on the latter topic produced 
TD-Gammon, a world-class backgammon-program BED and contributed to 
Deep Blue’s victory over Kasparov in 1997 JJ|. 



H.J. van den Herik, H. Iida (Eds.): CG’98, LNCS 1558, pp. 126- I14.51 1999. 
( c ) Springer- Verlag Berlin Heidelberg 1999 
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In this article we go a step further towards the ultimate goal of automatic 
evaluation function construction. First, a generalized linear evaluation model 
is presented. It restricts evaluation features to boolean combinations of given 
atomic functions. The model parameters can be tailored such that an automatic 
feature space exploration becomes feasible. The following sections cover all as- 
pects of evaluation function construction — from generating training positions 
over feature selection to weight estimation — with respect to the new model 
and emphasis on efficient implementation. Finally, we show how the presented 
techniques can be applied to the game of Othello and discuss the new approach 
with regard to related work. 

2 Evaluation Model 

We first give a definition of the evaluation model we are proposing and discuss 
its properties. In what follows, V denotes the set of all legal game positional, 
and 1R the set of real numbers. Let A be a finite set of integer valued — so 
called atomic — features and Ra ■= { (/(•) = k) \ / £ A, k is an integer} 
the set of relations over A that compare feature values with integer constants. 
Configurations are conjunctions of relations in Ra- For a position p £ V and a 
configuration c = rq A . . . A ri we define 



A configuration c is called active in a position p , iff c{p) = true. 

With this notation we can now define the Generalized Linear Evaluation 
Model — GLEM}? 5 , A, g) for short. Evaluation functions in this model have the 
following form: 



where ci,...,c n are configurations over Ra, wi,...,w n £ 1R are weights, and 
g : IR — > IR is an increasing and differentiable link function. 

The weights are subject to the usual least-squares optimization. That is, 
given a set of configurations ci,...,c n , a link function g, and a sequence of 
scored training positions ( (pt , r i) \ i = 1 . . . iV), the weights are chosen such that 
the total squared error 




n 




(i) 



N 



E(w) := ^{ri - e w {pi)) 2 . 



is minimized. This model has several desirable properties: 



1 W.l.o.g. it is assumed that game positions in V are normalized in such a way that a 



fixed player is to move. 
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— Atomic features are the building blocks of more sophisticated ones. This, 
in principle, allows the automated discovery of new important features by 
systematic combination. 

— If necessary, complex features can be added to A. Thus, “atomic” is not 
necessarily a synonym for “simple”. 

— When evaluating a position, features are combined linearly. This keeps the 
time overhead low. Actually, not even a multiplication with the weight is 
necessary since val (cj(p)) is either 0 or 1. 

— Non-linear effects can be approximated by using configurations that consist 
of several relations. 

— In order to deal with saturation an increasing non-linear link function, such 
as g(x) = 1/(1 + exp(—x)), can be used without increasing the run time 
during minimax search. There is no need to compute g , because g(x\) > 
g(x 2 ) xi > x 2 . 

— The simple linear core of the evaluation function allows an efficient approxi- 
mation of optimal weights, even for large systems. In the application reported 
later, more than a million weights were fitted to a training set consisting of 
eleven million scored positions in a reasonable period of time. 

At this point GLEM should be moved into the right perspective: in the stated 
form it is neither a new revolutionary evaluation approach, nor does it ease the 
task of automatic evaluation function exploration. This is because the model is 
built upon well known linear evaluation functions and does not impose a severe 
restriction on the structure of functions it includes. E.g., for any atomic feature 
set A, which is capable of distinguishing any two different positions (including 
game history if the game result depends on it) via conjunctions over Ra, GLEM 
covers all evaluation functions over V. A trivial example for such a complete 
atomic feature set for board games without position repetition is 

f s (p) = contents of square s in position p, 
s is a square 

where the contents of a square is considered to be an integer value. 

However, GLEM allows one to define a hierarchy of submodels in a natural 
way, which reflects different levels of computational complexity and the expres- 
sive power of the covered evaluation functions. By restricting the size of A, the 
number of configurations, or their structure, an automated search for new fea- 
tures becomes feasible. In the application discussed later, evaluation functions 
based on GLEM outperformed the best known functions so far. In this respect, 
GLEM breaks new ground. 

Good evaluation functions accurately estimate the winning chances in posi- 
tions visited during game-tree search and are optimized for speed. Therefore, the 
following topics have to be borne in mind when using scored training positions 
for tuning configuration weights: 

— The training positions have to be representative of the positions that will be 
evaluated later in actual game-tree search. 
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— Training positions must be scored accurately. 

— The selected configurations and their combination must have the expressive 
power to explain the data reasonably well while avoiding over fitting. Given 
the flat evaluation function representation in GLEM, meeting this condition 
may require a large number of configurations. Their automatic construction 
is therefore of great interest. 

— Evaluation speed is important. 

— While computing weights is an off-line process, its memory and time con- 
sumption should still be subject to optimization. The reason is that in the 
feature selection phase usually many evaluation function versions have to be 
compared. Moreover, without optimization the current solver might not be 
able to handle the number of features one would like to use. 

In the following sections these topics are discussed in detail in the context of 
GLEM. 

3 Training Positions 

A theory of how to generate good training sets in the context of evaluation 
function tuning has not been developed yet. In this section practical ideas are 
discussed which may become the seed for further investigations. 

Training positions can be generated and scored in several ways. If the con- 
sidered game has a long tradition and is quite popular, many games may be 
available in electronic format. The simplest scoring procedure assigns the final 
game result (depending on the side to move) to all positions occurring in a 
game. Obviously, this ad hoc procedure has limitations, since it does not ensure 
accurate scoring. Selecting games between good players alleviates this problem. 
But this approach leads to high-quality games, in which hardly any catastro- 
phe takes place, such as losing material in chess or a corner in Othello without 
compensation. The reason for this is obvious: good players know the important 
evaluation features and keep them mostly balanced in their games. What we 
(and machines) can learn from such games are the finer points of play, which 
make the difference between good and the best players. However, an evaluation 
function must also be aware of the most important features. Thus, our training 
set should also contain games in which at one point a player makes a serious 
mistake that is rigorously exploited by the opponent. In summary, a reason- 
able strategy for generating training positions from a game database is to select 
games played by at least one good player and to score game positions according 
to the final game result. This procedure is efficient and its output can serve as 
the basis for tuning the first evaluation function version. 

Besides the still present potential mis-scoring problem, the question arises, 
whether the so generated training set is representative to positions encountered 
in game-tree search. This question is of importance, since the weight fit for a 
linear evaluation function is influenced by the correlation among features in the 
training set. The answer obviously depends on the type of game-tree search 
we are conducting: in a highly selective search evaluated positions are in the 
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vicinity of principal variations, whereas in brute-force searches many ridiculous 
positions are evaluated, which one would never encounter in actual games. It 
seems natural to let the search algorithm generate the training positions by it- 
self. For instance, starting searches with positions from played games, a random 
subset of evaluated positions can be saved in a file and serve as the training 
set after scoring. In this way, the generated positions are surely a representative 
sample of the positions encountered in game-tree searches. It remains to assign 
accurate scores to the positions. This task can be accomplished again by game- 
tree searches, which normally return more reliable results than the evaluation 
function itself. In particular, in many games endgame positions can be evaluated 
perfectly — or at least more accurately than middle-game or opening positions 
— in a reasonable amount of time. In this case, a game-stage dependent evalu- 
ation function can be improved iteratively by first tuning the endgame weights. 
Thereafter, training positions from the previous game stage are evaluated by a 
game-tree search, which utilizes the just tuned evaluation function, and so on. 
The next step would be to generate even positions and those with a narrow 
advantage for one side. Similar to considering games between good players men- 
tioned above, these positions are useful for tuning weights of minor features or 
revealing possible tradeoffs between major features (e.g. material vs. king safety 
in chess or corner possession vs. mobility in Othello). 

If training positions are selected randomly during minimax based searches, 
one soon discovers that the winning chance in such positions is biased towards 
the player to move. This phenomenon is easy to explain, given the fact that 
in typical positions the majority of searched moves lose. Its undesirable effect 
on fitted weights is an artificial bonus for the player to move. This, in turn, 
leads to unstable evaluations, which compromise comparing evaluations backed 
up from depths of odd difference during selective search. Because the proposed 
generation procedure labels positions with search results, a simple cure for this 
problem is to add the principal variation successor positions to the training set 
after labelling them with the negated search result. 

4 Selecting Configurations 

GLEM proposes a new perspective on how to look at evaluation features. In the 
classical approach a couple of complex features are combined linearly. Weights 
were mostly hand-tuned. Later, the study of neural networks opened up a prac- 
tical way of combining features non-linearly. Application of the well known gra- 
dient descent procedure (in this context called “back-propagation”) makes it 
possible to automatically tune a large number of network parameters. A promi- 
nent and very successful example is Tesauro’s backgammon network which, in 
its strongest version, makes use of hand-crafted features in addition to a raw 
board representation. GLEM uses a different approach. Instead of modelling 
non-linear effects by applying parameterized analytical functions to features, 
GLEM handles non-linearities directly by assigning values to boolean feature 
combinations, called configurations. In this way, distinct cases can be handled 
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naturally, without the detour over non-linear analytical functions. The design of 
neural networks corresponds to configuration selection in GLEM, which is the 
topic of this section. After stating basic requirements for the atomic features, 
we will present an algorithm for generating configurations by analyzing training 
positions, and discuss several optimizations. 

4.1 Atomic Features 

Atomic features are the building blocks for configurations. As the scope of auto- 
matic configuration selection is limited by its time and space complexity, choos- 
ing the right abstraction level for atomic features is crucial. In Othello, configu- 
rations based upon the raw board representation are sufficient for building good 
evaluation functions — as we shall see later. The reason is that many relevant 
features in this game can be expressed by local board configurations of small 
cardinality. Other games may require a greater abstraction level. For instance, 
the relation “piece A attacks piece B” in chess has a long description length 
when using raw board representation languages. Since many important features, 
such as forks and pins, are based on those attack features, they certainly should 
be included in the atomic feature set. In general, candidates for atomic features 
are common parts of relevant features, that — combined in novel ways — may 
lead to new important features. Obviously, this selection task is beyond current 
program abilities. 

Not all atomic features have to be useful for building other features. Limita- 
tions of the configuration generator may suggest the inclusion of complex features 
that can not be expressed or well approximated by restricted combinations of 
other members of the atomic feature set. 

Moreover, GLEM generalizes the classical use of features — w ■ f(p) — be- 
cause ( w • k) in 

w ■ f(p) = ' Val (/(P) = k )- 

k 

specializes the weight of val(/(p) = k). This generalization is only meaningful if 
/ has a small range. In case one likes to incorporate a feature / having a large 
range, GLEM can be easily extended by allowing summation terms of the form 
w ■ f(p). 

4.2 Generating Configurations 

In a balanced evaluation function design the number of features can be increased 
up to a point where either 1) adding additional knowledge is compensated for by 
a decreased evaluation speed or 2) over-fitting becomes a problem. Since config- 
urations can be computed quickly, once the atomic features have been evaluated, 
GLEM encourages to use many configurations rather than a few complex fea- 
tures. Our chief concern is therefore over-fitting. 

We will first present an algorithm for generating a configuration set that 
does not suffer from over fitting. Thereafter, we will discuss how to deal with a 
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possibly unacceptably long run time for the configuration generator, for weight 
fitting, or for the configuration value look-up during game-tree search. 

Configurations have to cover positions that occur in game-tree search while 
avoiding over-fitting when optimizing weights. Both requirements can be met by 
using a large set of training positions — generated as described in the previous 
section — and selecting configurations that match a sufficiently large number of 
these positions. Fig. 1 shows a straight forward algorithm for this task. Given a 
set of atomic features A, training positions E, and a minimal match count n, it 
computes all valid configurations over A that occur in at least n positions in E. 
Beginning with all valid configurations of length one, the algorithm iteratively 
builds larger configurations by specializing previously generated configurations, 
until the matches count drops below n. The algorithm certainly halts, since the 
set of valid configurations is finite. Its correctness can be shown by induction 
using the fact, that for k > 1, valid configurations of length k have valid sub- 
configurations of length k — 1. 

The run time of the algorithm is 0(1(71 • \Ra\ 2 • |i?|), where C is the com- 
puted set of valid configurations and E the set of training examples. The most 
time-consuming part is computing the match counts in the inner loop. Since 
in the beginning the number of checked configurations grows exponentially in 



Function GenConf 

Input: atomic feature set A, training position set E, minimal match count n 

Output: configurations over A that are active in at least n positions of E 



R ■= {{/(•) = k} | / € A, k £ range (/), #match({/(-) = k}, E) > n} 
C := R ; collects all valid configurations 
N := R ; set of configurations created in previous iteration 



while N ^ 0 do 

M := 0 ; set of valid configurations in current iteration 

(*) foreach c £ N, d £ R do 

e := c U {d} ; specialize configuration c 

if ^match(e, E) > n then 

M := M U {e} ; append if valid 

endif 
endfor 



N := M ; next configurations to specialize 

C := C U N ; add valid configurations 

endwhile 
return C 



Fig. 1 . Pseudo code for generating the set of configurations that occur in at least n 
training positions. The function iteratively specializes configurations, which are imple- 
mented as sets of relations, until the number of matching positions (#match(e, E)) 
drops below n. 
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each iteration, it is crucial to optimize the match computations, especially if 
the number of positions is large. The following optimizations speed up a naive 
implementation considerably: 

— Due to the commutativity of A, valid configurations of length k may have 
several valid subconfigurations of length k — 1. This observation suggests 
that we should check whether a given specialization has been tested before 
in the current iteration, in order to avoid repeated match computations. An 
even better solution is to generate specializations in an ordered fashion by 
defining a total order over R and replacing line (*) by 

foreach c £ N, d £ R with d > max d! do 

d'£c 

It is not hard to show that after applying this time-saving modification the 
algorithm still generates all valid configurations. 

— A naive algorithm for deciding =#anatch(e, E) > n evaluates the relations 
in e for every member of E. The computation time of this algorithm can 
be reduced by preprocessing and parallelizing computations. The idea is to 
compute, for each r £ R, a sequence of bits (fej)-tu defined by bi := val(r(pi)), 
where Pi £ E is the t-tlr training position. After this preprocessing step, 
the actual features and positions are no longer needed. The match count 
computation reduces to and-combining the bit sequences of the involved 
relations and counting set bits in the result sequence. Modern CPUs allow 

Function MatchHeuristic 

Input: configuration e, chunk size s, random partition E\, of 

position set E as described in the text, confidence level t > 0 
Output: true, if #match(e, E) > n is likely; false, otherwise 

; match count fraction aimed for 
; number of elements checked 
; current match count 

; update counts 
; #match(e, E) > n is likely 
; #match(e, E) < n is likely 



q := n/#E 
d:= 0 
u := 0 

for i := 1 to m — 1 do 

u := M+#match(e, Ei) 
d := d + s 

if u> dq + t dq( 1 — q) then 
return true 
endif 

if it < dq — t \J dq( 1 — q) then 
return false 
endif 
endfor 

return u+#match(e, E m ) > n 



Fig. 2. A fast procedure for testing the hypothesis #match(e, E) > n 
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a very efficient implementation of the and-part by handling 32 or even 64 
bits in parallel. Iterating x := x A (x — 1), which clears the rightmost one 
in the binary representation of x, allows us to count set bits quickly. In this 
application table-based techniques for counting bits are inferior because the 
number of set bits is decreasing rapidly due to specialization. 

— Replacing the condition #match(e, i?) > n by a sequential statistical test 
procedure speeds up the computation further. This optimization can be mo- 
tivated by an intuitive example: if among the first 100 randomly selected 
bits of 1000 there is only a single one, it is very unlikely that the total num- 
ber of ones exceeds 500. More formally, we propose the following heuristic 
function, which quickly checks whether #match(e, E ) > n holds with a pre- 
scribed likelihood. In a preprocessing step, E is randomly partitioned into 
chunks E\,..., E m of size s ( E m might have less elements). For a given con- 
figuration e, the function then iteratively computes the match counts for 
increasing subsets beginning with E\. If the match count fraction at one 
point significantly differs from the one we aim for, the function returns the 
likely truth value of ^matclife, E) > n early. The pseudo code implemen- 
tation shown in Fig. 2 makes use of the fact that the expected number of 
ones in a sequence of d randomly generated bits is dq , if Probjl} = q , while 
its standard deviation is \J dq(l — q). The behaviour of this function is con- 
trolled by confidence level t. For large values of t, hardly any break condition 
will be met — the function will be slow, and almost always return the correct 
result. If t is small, the function is quick, but it also returns unreliable results. 
Experiments can tell how to choose t depending on the speed/reliability one 
likes to achieve. 



4.3 Finding Active Configurations 

During weight fitting and position evaluation the set of active configurations 
has to be computed quickly for a large number of positions. For this purpose, 
we represent the set of all configurations over Ra by a DAG G. Nodes in G 
correspond to configurations, and arcs mark direct specializations. A detailed 
example is shown in Fig. 3a. The just described selection algorithm computes all 
configurations that occur at least n-times in a set of training positions. This set 
of valid configurations induces a sub-DAG G' of G. Given a position, all active 
configurations can be found by a depth-first search in G' starting at its root. 
During search, all visited configurations are marked and their active status is 
determined. The search stops in nodes that have been visited before or have been 
found inactive. This algorithm quickly finds all active configurations. However, 
the only relevant active configurations for evaluation purposes are those without 
active specializations, because generalizations are redundant. It is easy to extend 
the described algorithm accordingly by restricting its output to leaves of the 
active configuration sub-DAG. Fig. 4 illustrates the entire procedure. 
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Fig. 3. a) Configuration DAG for two features /i ,/2 with range(/i) = {0,1} and 
range(/ 2 ) = {0, 1, 2}. ri : k denotes the relation /;(•) = k. b) Configurations belonging 
to patterns over /i and / 2 . 




a) configuration sub-DAG G’ b) active configurations and c) most specific active 

left-to-right DFS numbers configurations 

Fig. 4. Finding the most specific active configurations by depth- first search in 
the configuration DAG 



4.4 Reducing Complexity: Patterns 

So far, our focus has been on efficient ways for generating configurations and 
computing active configurations. Despite the optimization efforts, GenConf may 
still not be able to generate all valid configurations due to time or space limi- 
tations. Furthermore, a large number of generated configurations might prevent 
an efficient position evaluation, because too many configurations are active, or 
the configuration data needs too much memory. 

One solution to these problems is to increase the minimal match count n, 
until the number of generated configurations is manageable. This approach, how- 
ever, narrows the evaluation function’s view by focusing it on the most common 
phenomena. A compromise is to generate all valid configurations choosing n high 
enough to avoid over fitting, and to reduce their number afterwards by looking 
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at their statistical significance with regard to winning chance prediction^ An- 
other option for reducing the number of configurations is to limit their size or to 
choose subsets of the atomic feature set as the base for generating configurations. 

Finally, considering sets of mutual exclusive configurations helps to reduce 
the number of active configurations in order to speed up the evaluation consid- 
erably. Let G be the complete configuration DAG for {/i, ..., f m } C A (Fig. 3a), 
and let r m j n and r m ax denote the minimum/maximum range cardinality of 
the features. Then the number of nodes in G is bounded by (1 + r m i n ) m and 
(1 + r m ax) m , and for any position the number of active configurations is 2 m . 0 
Thus, in case of complete configuration DAGs the DFS algorithm presented in 
the last subsection seems to waste time by searching a large number of nodes 
before it eventually returns the single active configuration we are interested 
in. This observation motivates looking for a more efficient data structure. For 
{/i, f m } C A we collect all possible most specific configurations in a set called 
pattern [/i , i.e. 

pattern[/i, ..., f m \ := {ryq A ... A | r iM = (/)(■) = k), k € rang e(/j)} 

Configurations in pattern [/i, ..., f m \ correspond to leaves of the complete con- 
figuration DAG (Fig. 3b). Data related to these configurations can therefore be 
stored in a table addressed by feature values. For instance, in Fig. 3b the table 
index for pattern [/i, /-j] with regard to position p is simply 3 • fi(p) + 
Checking whether a pattern configuration is valid only requires incrementing a 
match counter stored in a table whenever a configuration is active, and com- 
paring the result with the minimal match count. Detecting whether a pattern 
configuration is active during weight fitting or positional evaluation is a matter 
of a fast index computation and one table access. Incremental updates of only 
those indices which are influenced by moves speeds up game-tree search further. 
In summary, the flat table is the data structure of choice for storing information 
regarding small and medium sized complete configuration sets. The fast access 
encourages to restrict configuration sets to patterns. 

Large patterns require a more memory efficient representation. In order to 
avoid over- fitting, we are still only interested in configurations that match several 
training positions. Consequently, large patterns are sparse. Fig. 5 outlines a very 
fast and — to our knowledge — novel technique for accessing sparse data which 
trades memory for speed. It is based on representing valid configurations as 
index tuples («i,* 2 ). For a given position and pattern, i\ and ii are computed 
by splitting the pattern’s feature set into two parts and performing the index 
calculations described above separately for each subset. Both indices are then 
used for accessing a hash-table, in which data regarding configuration (ii,^) 

2 The general problem of deciding the relevance of variables in a multivariate regression 
model in advance is hard. Nevertheless, simple statistics like the feature’s correlation 
with the training position scores can serve as a reasonable first approximation. 

3 These numbers can be derived by adding lower/upper bounds for the number of 
nodes/active configurations for each depth and applying the identity X/"=o (T) 3 " 1 = 
(1 + x) m . 
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hash-table 



Fig. 5. Fast sparse data access. Data regarding a configuration represented by two 
indices i 1 and 12 can be accessed quickly in two steps. 



is stored. First, an offset is looked-up in a table using index i\. Then, this 
offset, incremented by Z 2 , is used to access the hash-table. For the algorithm 
to be correct, 1) unique hash-table entries have to be assigned to valid index 
tuples, and 2) invalid index tuples must be detected. The first condition can be 
met by choosing suitable offsets and a sufficiently large hash-table. In practice, 
the following greedy algorithm for constructing collision-free hash-tables has 
produced reasonable results: beginning with the most frequent Zi -values, offsets 
are assigned to them in first- fit manner. That is, whenever a collision occurs 
when attempting to occupy the hash-table entry offset(zi) + i 2 , all i\ entries 
claimed so far are erased and offset(zi) is incremented before restarting. The 
hash-table size must be greater than the sum of the maximal offset and maximal 
possible value of Z 2 , in order to avoid accesses beyond table end. A simple way 
for meeting condition 2) is to add the lock i\ to hash entries for all valid tuples 
(zi, Z 2 ) and to reject tuples (zi, 12 ), for which the lock stored in the accessed hash 
entry does not match i\. Locks of unused hash entries must be initialized with 
a value different from any possible i\ (e.g. —1). Finally, offsets for all zi, which 
are not the first component of any valid index tuple, can be safely set to 0, since 
all locks in the hash-table are different from those i\ values. 

Patterns may outperform configuration sets constructed by GenConf due to 
a much faster generation and evaluation of configurations. However, patterns 
suffer from their limited scope because patterns may miss essential generaliza- 
tions. This observation suggests building a hierarchy of patterns in order to 
quickly cover both general and specific position aspects. Since this approach 
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also increases the evaluation time, experiments have to tell, which is the better 
strategy for a given application. 

5 Weight Fitting 

The previous sections discussed the generation of scored training positions and 
the selection of configurations. In order to conclude the evaluation function con- 
struction, we must show how to assign weights to configurations. 

If the number of weights is large or non-linear models are used, direct weight 
computation is no longer feasible. Instead, iterative methods have to be used 
for weight fitting, which are usually based on variations of the gradient decent 
procedure. In each step, this procedure updates the current weight vector in 
direction of the negated gradient of the error function. If features are highly 
correlated, this simple algorithm is known to converge slowly. Faster conjugate 
gradient algorithms have been developed ifjj , that do not suffer from this problem. 
However, because the basic algorithm works sufficiently well in practice and 
is easier to implement, its application will be discussed in more detail in the 
remainder of this section. 

5.1 Basic Considerations 

In games, the purpose of evaluation functions is to estimate the winning chance 
for the player to move. This goal can be accomplished literally by constructing 
functions that map positions into [0,1]. Alternatively, the game may provide 
a numerical scoring of terminal positions reflecting the win size. In this case, 
a reasonable evaluation objective is to estimate the final game score. In either 
case, experiments should be conducted to find a suitable link function g. The 
most commonly used candidates are the identity function and sigmoid functions 
of the form g(x) = 2C / (1 + exp(—x)) — C. For instance, for modeling the winning 
chance an S-shaped link function g : 1R — > [0, 1] can be used in order to deal 
with saturation. In this regard, g(x) = 1/(1 + exp(-x)) is of special interest, 
because the weight fitting process benefits from a quickly computable derivative 
of g , which in this case is g(x)( 1 — g(x)). A straight forward scoring scheme for 
terminal positions in this model assigns 0.9 to won positions, 0.5 to draws, and 
0.1 to lost positions for the player to move. It is important to realize that an 
optimal weight vector may not exist if the extreme values 1.0 and 0.0 are chosen. 

Given a sequence of scored training positions ({Pi,ri))fL 1 the objective is to 
find a weight vector Wq which minimizes the error function 




n 




where 
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Starting with an initial guess w^°\ in each step the basic gradient descent pro- 
cedure updates the weight vector according to 

( 5 ( d = _ a . (grad w E)(w < ' t ^) 0 

«,(*+!)=«,(*) + *(*). 

a > 0 is the step size and grad^i? is the vector consisting of E’s partial deriva- 
tives Jjj-. This update scheme changes the weights in direction of the error 
function’s steepest descent and is widely used for training artificial neural net- 
works. 

In this application, the partial derivatives have a simple form due to GLEM’s 
flat evaluation structure: 

nr 9 ^ - n \ 

g—( w ) = -jj^9'y^Wihi,k)Ak(w)hi,k- ( 2 ) 

1 v fc= l i=i 

If g is the identity function, this expression reduces to 

dE 2 \ 

fc = 1 

Thus, steepest descent updates for all weights can be computed efficiently in a 
single pass through the training data. It is worth noting, that the computation 
of (2) can be arranged in such a way that its run time depends on the number 
of hi t k different from 0, rather than on N. Especially when using patterns, the 
savings thus achieved are significant. 

Since the configuration match count may vary by large factors, the described 
update step changes weights at very different speeds. This is undesirable, because 
at one point the iteration process has to be stopped, and by then, weights of 
rare but important configurations might not have reached a proper level yet. A 
simple way to deal with this problem is to normalize the updates by dividing 
the sum by the number of hi ^ 0 instead of N. 

5.2 Position Type Dependent Weights 

The evaluation of configurations may depend on the game stage or, more gen- 
erally, on the particular type of the position. For instance, centralizing the king 
in chess openings is considered suicide, whereas his activation is crucial in many 
endgames. It may therefore be worthwhile to partition the training set accord- 
ing to position type, and to select configurations and fit weights separately for 
each set. In order to avoid big evaluation jumps when crossing type boundaries, 
which can cause undesired artifacts in game-tree search, it is helpful to define fine 
grained position types and to smooth evaluations across adjacent types. Fitting 

4 adding f3 ■ - known as “momentum” — can improve the convergence in case 

of correlated features. 
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weights for many position types, however, requires a large number of training 
positions, provided the minimal match count is maintained in order to elimi- 
nate over -fitting. Globally lowering the match count is therefore not an option. 
Instead, a more local view can help to reduce the number of needed positions. 
One suggestion when fitting weights for a particular position type, is to con- 
sider the training positions from adjacent types as well. This method increases 
the number of positions for any single position type and weights are smoothed 
automatically. The second option is to fit position type dependent weights in 
a more flexible manner. For this purpose, valid configurations are generated by 
considering all training positions. The weight fitting process then decides, how 
to compute the configuration weights separately for each type of position. For 
any type, for which the particular configuration match count is sufficiently high 
(say > 20), it is safe to fit the according weight as described in the previous 
subsection. If the count is small (say < 4), over- fitting is likely and the configu- 
ration should be treated as if there is no information available, i.e. the weight is 
set to 0. Cases in between can be handled by merging adjacent position types, 
until the total match number allows a robust weight fit. Here, the alternatives 
are to have only a single weight for all involved types or, if there are enough 
positions available, to fit a parameterized weight model. An example for such a 
model is w(k) = a ■ k + 6, kg < k < k\, which states a linear relationship be- 
tween the weight and the position type k — coded as an integer — in [ko , ..., k{\. 
Of course, this kind of model is only meaningful for position types that can be 
totally ordered, such as opening, middle-game, and endgame. Incorporating the 
update of parameters a and b in the gradient descent procedure is not hard. 

This technique allows a flexible and robust fitting of position type dependent 
weights. After generating training positions and selecting configurations, this 
concludes the evaluation function construction. 



6 Application: Othello 



The presented general framework for the construction of evaluation functions 
has been inspired by the work on our Othello program Logistello. Besides the 
progress in selective search and automated opening book construction, the ap- 
plication of the techniques discussed has contributed to the considerable playing 
strength of this program. Logistello is able to beat the best human Othello 
players handily, even when running only on ordinary hardware Pj. The details 
of Logistello’s evaluation function already have been discussed in jT] . We will 
therefore only give a short overview and concentrate on its recent improvement, 
which is based on the sparse pattern approach presented above. 

Othello is a popular Japanese board game, played by two players on an 8x8- 
board using 64 two-colored discs. Moves consist of placing one disc on an empty 
square and turning all bracketed opponent’s discs over. Fig. 5 shows an example. 
The game ends when neither player has a legal move, in which case the player 
with the most discs on the board has won. 
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Fig. 6. Example positions. Legal moves are marked with a dot. 



The most important concepts in Othello are disc stability, mobility, and par- 
ity. In particular: 

— Stable discs can not be flipped by the opponent. Therefore, they directly 
contribute to the final score. The most prominent stable discs are occupied 
corners, which can be used as anchors for creating more stable discs. 

— Having fewer move options than the opponent is dangerous, because it in- 
creases the chance of losing a corner in the near future. 

— Making the last move in an Othello game is advantageous, since it increases 
one’s own disc count while decreasing the number of opponent’s discs. Parity 
generalizes this observation by considering last move opportunities for every 
empty board region. 

In jTJ it has been shown, that all of these features can be quickly approximated 
by pattern configurations built upon a raw board representation. The chosen 
patterns are shown in Fig. 7. Horizontal, vertical, and diagonal lines of length 
> 4 are included for covering mobility. The remaining patterns deal with the im- 
portant corner regions and edges. The evaluation function distinguishes 13 game 
stages, depending on the number of discs on the board. Applying the techniques 
described in the previous sections, about eleven million scored training positions 
were generated to fit approximately 1.5 million weights. This figure takes weight 
sharing among symmetrical configurations into account. Starting with w/ 0 ) = 0, 
the weight fitting procedure took a Pentium 11/333 CPU about 30 hours to 
reach an acceptable accuracy level after 250 iterations. Equipped with an eval- 
uation function very similar to that we have just described, Logistello beat 
the human Othello World-champion 6-0 in August 1997 j2). After four years of 
successful tournament play, Logistello ended its career in October 1997 with 
a straight 22- win victory in its last computer Othello tournament. 

Recently, the incorporation of larger patterns has improved the evaluation 
performance. In the current implementation, configuration weights are repre- 
sented as 16 bit integers. Storing weights for 10-square patterns in 13 flat tables 
thus requires 3 10 • 2 ■ 13 « 1.5 million bytes. Using the same approach for storing 
weights for much larger patterns is therefore out of the question. The first ex- 
periments with several sparse data access schemes based on binary search were 
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Fig. 7. Logistello’s previous pattern set. Patterns that can be obtained by rotating 
and mirroring the board have been omitted. Each diamond represents an atomic feature 
/ with range {0, 1, 2}. f(p) is defined by the particular square contents (e.g. white disc 
i — * 0, empty i— > 1, black disc i— ► 2). 
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Fig. 8. Large patterns tested. For each of these patterns the simplified pattern version 
of GenConf generated about 88,000 valid configurations (# E « 11 million, n = 75). 
All configuration sets fit in hash-tables with about 310 thousand entries. 



disappointing. Increasing the program’s knowledge by adding the patterns shown 
in Fig. 8 could not compensate for a slowdown of about 45%. Only after utilizing 
the fast hash-table access scheme and adding just one of the three features, the 
program achieved its best performance so far. Table 1 summarizes the results of 
all tournaments that have been played to evaluate each version. All games were 
played by brute-force versions of Logistello running on Pentium 11/333 PCs. 
On this hardware Logistello achieves a middle-game speed of approximately 
270K nodes/sec when the patterns shown in Fig. 7 are used. This speed enables 
the program to look 12-14 ply ahead in the opening and middle part of ten 
minutes games. 

The patterns presented in Fig. 8 were chosen based on both game and evalu- 
ation speed considerations. Human players frequently make use of their abilities 
to evaluate large disc formations which are not covered by the basic patterns. 
Of special interest are edge interactions and 2 x 8-corner configurations. On the 
other hand, it is preferable to add patterns for which the index computation can 
make use of already determined indices. The chosen 16-square patterns meet this 
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Table 1 . Tournament results. Logistello using the basic patterns played 434-game 
tournaments against several versions that — in addition — employed the large pat- 
terns shown in Fig. 8. The results indicate that speed matters. The strongest versions 
are those that only use either pattern A or B. They beat the previous version signifi- 
cantly, although they are 11% slower. When playing at equal strength the best version 
only needs to search about 2/3 of the nodes — as the results of the time-handicap 
tournaments indicate. 



opponent 


time/game 


#nodes 


opponent results 


winning 




(minutes) 


(fraction) 


wins 


draws 


losses 


percentage 


A 


10-10 


0.89 


213 


58 


163 


55.8 


B 


10-10 


0.89 


211 


60 


163 


55.5 


AB 


10-10 


0.83 


203 


60 


171 


53.7 


ABC 


10-10 


0.8 


211 


49 


174 


54.3 


A 


6-10 


0.51 


172 


59 


203 


46.4 


A 


7-10 


0.62 


183 


55 


196 


48.5 


A 


8-10 


0.71 


195 


63 


176 


52.2 



preference. Nevertheless, the results show, that the combined knowledge coded 
in the new patterns does not compensate for the speed drop. This finding indi- 
cates that a significant improvement of a sequential program may not be possible 
by adding further patterns based on the raw board representation. However, a 
more effective atomic features might exist which in combination outperform the 
current evaluation function. 

7 Summary and Discussion 

In this paper a practical framework for the semi-automatic construction of eval- 
uation functions has been presented. Based on a generalized linear evaluation 
model — called GLEM — efficient procedures have been developed for generat- 
ing training positions, exploring the feature space, and fitting feature weights. 
Rather than combining a few features by using complicated non-linear functions, 
we propose to construct evaluation functions by combining many — possibly 
more than hundred thousand - features, which are boolean combinations of 
atomic relations. This approach allows us to model non-linear effects directly, 
without the detour over analytic functions, and opens up practical ways for gen- 
erating features automatically. GLEM allows the program author to concentrate 
on the part of evaluation function construction, where humans excel: the dis- 
covery of fundamental positional features by reasoning about the game. GLEM 
simplifies this task because the exact feature formulation is no longer needed. 
The system is able to approximate complex features by combining atomic frag- 
ments. In this way, it is now possible for the programmer to speculate about 
feature building blocks and to leave the creation of actually used features as well 
as assigning weights to them to the system. One example for this strategy has 
been presented in this paper: the observation that configurations can approxi- 
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mate important Othello concepts combined with the “mechanical” analysis of 
millions of training positions has produced an expert program capable of beating 
any human player. An interesting fact is that the game knowledge encoded by 
the set of over a million configuration weights goes far beyond the features we 
intended the system to approximate in the first place JT| . This result encourages 
the application of GLEM to other games or even to search or decision problems 
in other domains. Attractive candidates are chess and Go since both games are 
very popular and well analyzed. And yet, for chess, hardware roughly equivalent 
to four thousand ordinary PCs is currently needecQ to compete with the human 
World-champion. For Go the status is even worse because brute-force search is 
not feasible due to the large branching factor. Since a good evaluation function 
is not known either amateurs are still able to beat the best Go programs. It is 
our opinion that the key to better chess and Go programs lies in improved eval- 
uation functions. A starting point is the analysis of known features with regard 
to their approximation by weighted configurations as proposed by GLEM. 

The automatic construction of features has been studied by several authors. 
Utgoff E2 proposes a general evaluation function learner, called ELF, which 
combines the processes of constructing boolean feature combinations and weight 
fitting. This approach has been shown to be effective in small artificial problems, 
but could not convince in its application to checkers. The main problem of 
ELF is its low speed. Taking into account the large number of features needed 
for an adequate evaluation in complex domains, and the resulting considerable 
effort for optimizing weights, it seems hopeless to combine feature construction 
and weight fitting. Other approaches for constructing features or adapting the 
combination function while fitting weights (e.g. Morph jSJ, meiosis networks |3J, 
node splitting UH) , face similar complexity problems. Our solution is to separate 
these tasks in order to speed-up the process and to give many opportunities for 
optimization. 
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Abstract. It has been said to be very useful for Go playing systems to 
have knowledge. We focus on pattern level knowledge and propose a new 
model of pattern acquisition based on our cognitive experiments. The 
model consists of two steps: pattern acquisition step, using only positive 
examples, and pattern refinement step, using both positive and negative 
examples. The latter step acquires precise conditions to apply and/or the 
way of conflict resolution. This model has advantages in computational 
time and precise control for conflict resolution. One algorithm is given for 
each step, and each algorithm can change independently, it is possible to 
compare algorithms with this model. Three algorithms are introduced for 
the first step and two for the second step. Patterns acquired by this model 
are applied to Tsume-Go problems (life and death problems) and the 
performance between six conditions are compared. In the best condition, 
the percentage of correct answers is about 31%. This result equals the 
achievement of one dan human players. It is also shown that the patterns 
enhance search techniques when the search space is very large. 
Keywords: knowledge acquisition, evolutionary learning, pattern acqui- 
sition, Go, Tsume-Go 



1 Introduction 

1.1 Purpose 

Studies on games have mainly focused on search techniques, and these techniques 
support systems playing most games — such as chess, checkers, and Othello 
to perform as well as human experts. In these games, full-width search to some 
depth is very common. Full-width search is, however, very difficult to apply in 
games such as Go and Shogi (Japanese chess), whose branching factors are about 
250 and 80, respectively, as compared with the 35 of chess. Therefore, selection 
of moves is indispensable. One promising method for selecting moves is to use 
knowledge. 

We have classified Go knowledge into two major levels based on cognitive 
studies: pattern level knowledge and language level knowledge EU Pattern level 
knowledge includes patterns and sequences of moves. A pattern is a rule whose 
condition part is a partial board configuration, and whose action part is a move. 
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A sequence of moves is a rule suggesting several moves. It is said that they are 
very useful for human players but are missing in current computer systems HIE 
Language level knowledge is a verbal rule consisting of Go terms used by human 
players. 

In this study, we focus on pattern level knowledge, especially patterns. Al- 
though patterns have often been used in Go playing systems j2j, most of them 
have been entered by programmers and it is very hard to input enough patterns 
to replicate the ’’skill” of human players. Instead, some studies have examined 
automatic pattern acquisition nang. However, when the patterns are used in 
the actual games, too many patterns are matched, and mechanisms to resolve 
the conflicts among patterns are needed. These mechanisms are computationally 
expensive and hard to implement. Therefore, we propose, in this paper, a new 
model of acquiring patterns that also learns the conflict resolution mechanisms. 
The reason why too many patterns are matched in a situation is because the 
patterns contain only a partial board configuration as the matching condition. 
We assume that patterns with additional information - such as detailed de- 
scriptions about when the patterns are applied or the priority of patterns - can 
substitute for the conflict resolution mechanisms. This process can be consid- 
ered as a refinement of the rules. Thus we propose a model of pattern acquisition 
where patterns are acquired and the acquired patterns are refined, as explained 
in Subsection E3 

The effectiveness of these refined patterns is investigated by solving Tsume- 
Go problems (life and death problems of Go), because performance is easy to 
measure and compare. This paper compares the performance of our model to 
that of human players. 

Most systems solve Tsume-Go by searching almost all possible moves without 
using knowledge to narrow the search space. GoTools d, which is one of the 
best Tsume-Go solvers, uses patterns only at the end node of the search tree. 
We narrow the search space with patterns by finding the first move. In general, 
you can narrow more if you narrow earlier, so narrowing the first move narrows 
the most. We use patterns to select candidates of the first move of sequences of 
Tsume-Go answer moves. A related work is that of Sasaki d, wherein Tsume- 
Go problems are solved using Neural Networks without search. 

1.2 A 2-Step Model of Pattern Acquisition 

Our cognitive studies HZ) have shown that kyu level (beginner level) amateur 
players have patterns whose condition is only a part of board configuration, 
and that dan level (expert level) amateur players have patterns with precise 
conditions when the rules that can be described by Go terms can be applied. 

1 Sequences of moves in Go are important and helpful for humans and computers, as 
well as in chess. One reason is that they are helpful for “answer moves” , your response 
to an opponent’s move. Another reason is that in Go there are many sequences which 
cannot be stopped by halves. For example, a sequence that yields good results if 
completed may lead to bad results if stopped by halves. 
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We propose the following assumption: when players are weak, they acquire 
patterns by reading books or watching good players’ moves. However when they 
actually apply the acquired rules, they sometimes fail. This is because the con- 
ditions of the acquired patterns are inappropriate and they sometimes apply a 
rule in the wrong situation. Another reason is that they do not know the priority 
of rules and they apply less important rules. As a consequence of trial and error, 
they gradually learn more precise rule conditions or priority. 

Therefore, we propose the following two-step model to acquire proper pat- 
terns. The first step is to mainly acquire patterns. This is the pattern acquisition 
step. In the second step, some additional information on the patterns, such as 
the conditions under which the rules can be applied or priority to resolve con- 
flicts among rules, are added in order to refine the rules. This can be considered 
as the pattern refinement step. 

This is a concept-level description of the model. When the model is imple- 
mented, we should create an algorithm for each step. In this paper, the first 
algorithm acquires patterns whose conditions are only partial board configura- 
tions. It only requires positive examples. The second algorithm, however, requires 
positive and negative examples, because they are learned as a consequence of 
trial and error. These algorithms can change independently. 

1.3 Procedures of This Paper 

This model needs two algorithms: one is for the first step and the other is for the 
second step. We introduce three algorithms for the first step in Section |2J and 
two algorithms for the second step in Section 0J We carry out six experiments 
(2 x 3) to compare these algorithms by solving Tsume-Go problems in Section 
Q In Section El human performance against the Tsume-Go problems is shown 
in a comparison to that of our system. 



2 Pattern Acquisition Algorithm 

This section introduces three algorithms for the first step of pattern acquisition. 
The first algorithm acquires flexible patterns whose shape and size are widely 
varied, and is called “Flexible Algorithm” in this paper. Other two algorithms 
acquire patterns of fixed size and shape. One of them, called “Fixed Algorithm” , 
always acquires patterns in much the same size and shape. This type of pat- 
terns are often used in the previous studies m- The other, called “Semi-Fixed 
Algorithm”, acquires almost fixed patterns but the size and shape of acquired 
rules are slightly varied between patterns. These three algorithms are explained 
in this section. 



2.1 Flexible Algorithm 

The first algorithm acquires flexible rules. This algorithm was initially proposed 
to acquire flexible patterns from game records 0. Although Tsume-Go problems 
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and their answers can be considered game records of a few moves (usually one 
or three moves), the small number of Tsume-Go problems causes overfitting 0. 
Thus a new mechanism is implemented to avoid overfitting. This subsection 
briefly explains the Flexible Algorithm. For details, see HQ. 



Overview of the Algorithm The algorithm’s aim is to acquire useful rules, 
herein individuals , which match the given training data, herein food. Each rule 
takes the form of a production rule, consisting of an IF-part (consisting of con- 
ditions) and a THEN-part (consisting of an action), and has an activation value. 
There are no rules in the initial state. 

Rules matching a training datum get food, and their activation values in- 
crease. If there are no rules matching a given datum, a new rule matching the 
datum with only one condition is created. The number of rules with only one 
condition thus increases at an early stage. 

After being fed the rules with activation value over a threshold are split into 
two rules: the original one and a more complex one. More complex rules are thus 
created by splitting, and the number of rules increases. 

Every rule consumes food at each step, and decreases one activation value. 
Rules whose activation value is 0 die. Thus, rules that are too complex and get 
food too rarely, die. As a result, all the rules are expected to get food at almost 
the same frequency. The procedure of this algorithm is shown below. 

Algorithm 1.1 Pattern Acquisition Algorithm 

1 step <— 1 

while step < the number of iterations 

2 choose a random game record from the game database 

3 move <— 1 

while move < the number of moves in the game 

4 a training datum <— move - th move 

5 if no rule matches the datum 

then create a new rule 
else feed matched rules 
for all rules 

6 if activation of a fed rule > threshold 

then split the rule 

7 activation of a rule <— activation of a rule — 1 

8 if activation of a rule = 0 

then the rule dies 

end for 

9 move <— move + 1 

10 step <— step +1 

end while 
end while 



2 When the volume of training data is small, less general rules that suit only the train- 
ing data tend to be acquired. This is usually called overfitting in machine learning. 
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Details of the Algorithm 



Rules Each rule takes the form of a production rule consisting of an IF-part 
and a THEN-part. 

A rule is described in relative terms so that it can be executed by either 
player. Eight equal board configurations (4 rotations (90 degrees) times 2 reflec- 
tions) are converted to one of the configurations. Rules are described as follows: 

IF exist ([a:i, j/i ], o&ji) A ...A exist ([x n ,y n \,obj n ) THEN play([0,0]), 

where [ay, y.,] represents relative coordinates from the action place El, whose co- 
ordinate is always [0,0], and obji is one of the four objects shown below: 

1. SAME: the same color stone as the action stone 0 

2. DIFF: a stone with different color from the action stone 

3. EDGE: an edge of the board E| 

4. —n: a previous movef] (n £ A, n > 0) 

The following rule is an example: “IF exist([-l,-l], -1) A exist([0,-l], SAME) 
A exist ([-2,-1], DIFF) A exist([0,-5], EDGE) THEN play([0,0])”. This rule is 
shown in Figure^ The active player is always shown as Black in this paper. The 
action stone and the move are presented as the stone with the largest number. 
This stone is “2” in this example. This rule means that if the previous move 
(shown as “1”) is played to point [-1,-1] (relative to the action place) and a 
stone of the same color as the action stone (Black) exists at point [0,-1] and a 
stone of a different color from the action stone (White) exists at point [-2,-1] 
and an edge exists at point [0,-5], then put a stone on [0,0]. 




Fig. 1 . An example of rules: IF exist ([— 1, — 1], — 1)A exist([0,-l],SAME) A 
exist ([-2,-1], DIFF) A exist ([0,-5], EDGE) THEN play([0,0]) 



3 The place where a stone is placed according to the rule. 

4 The stone to be placed according to the rule. 

5 An edge is located just outside the board, not inside the board. 

6 “—3” represents the move played three moves before. This is used to acquire se- 
quences of moves. 
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Feeding Rules Among matched rules, those that do not have rules more spe- 
cific than themselves are fed. An activation value equal to Parameter FOOD is 
equally shared among the fed rules. The following is an example. 

Suppose that the following five rules matched a given training datum. 



1. IF Ci THEN Ai 

2. IF Ci A C 2 THEN A x 

3. IF C 2 A C 3 THEN A x 

4. IF C 4 A C 5 THEN A x 

5. IF C 2 A C 3 A C 4 THEN A 4 

Rules 1 and 3 are not fed because they have more specific rules than them- 
selves (Rules 2 and 5, respectively). The others (Rules 2, 4, and 5) share food, 
and each gets one third of the food. 

As a consequence, the more general the rules are, the more frequently they 
are matched: however, because of this feeding method, the probability that they 
are fed falls although they are matched. As a result, too general rules are not 
acquired because they often match with food, but rarely get food as long as more 
specific rules exist. 



Splitting and Creating Rules A fed rule whose activation value is greater 
than a certain threshold value is split into two rules: one, called parent, is the 
original rule and the other, called child, is created by adding a new condition 
to the IF-part of the original rule. For example, rule, IF C\ A C 2 THEN A\, is 
split into the original rule and a newly created rule, IF C± A C 2 A C 3 THEN 
A\. The new condition, C 3 , is chosen at random from among the objects on the 
current board (stones, edges of the board, and the previous moves). 

A child extracts a certain amount of the activation value from its parent. As 
a result, after the split, the activation value of the rule changes as listed in Table 
[D The total activation value, however, does not change after the split. 



Table 1. Change of activation value after splitting. 





before 


after 


parent 


P_ACT 


P_ACT - INI_ACT 


child 




INI_ACT 



If no rule is matched, a new rule is created which has only one predicate 
chosen randomly from among the objects on the current board. A newly created 
rule that is the same as a rule already in the rule set is deleted. There is thus 
no duplication of rules. 
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Avoidance of Overfitting: Memory The total number of training data of 
Tsume-Go is much smaller than that of real games, because the number of 
moves (training data) in one Tsume-Go problem is usually 1, 3 or 5 and is much 
smaller than that in game records (about 207 51 ); moreover, there are fewer 
Tsume-Go problems than game records. Since the small number of training data 
usually causes overfitting, a new mechanism, memory , is introduced to avoid 
this problem. 

The memory avoids acquiring rules used by only a few training data. When 
memory is used, IDs Q of the last M training data which a rule matches are 
recorded in the rule. When a rule matches a training datum whose ID is recorded, 
it is not fed. By this mechanism, a rule is not fed by training data by which it 
is fed last M times. M is usually set to 2 or 3. 



2.2 Fixed Algorithm 

The second algorithm acquires fixed patterns. The shape of the acquired patterns 
is the same, but the size can change. A pattern includes all the squares within d 
Manhattan distance from the center and their adjacent 8 squares. The IF-part 
is all the squares except the center, and the THEN-part is the center square, so 
when a pattern is matched, the move is to the center square. Figure 0 shows the 
shapes of patterns. In the figure, the numbers indicate the Manhattan distance 
from the center square, ‘a’ means their adjacent squares, and ‘1’ means the 
center square. A square is occupied by one of the following five objects: SAME, 
DIFF, EDGE, EMPTY (the square is not occupied by any stone), and OB(the 
square is out of the board) . Note that empty squares are explicitly described in 
this algorithm. In this paper, d ranges from 0 to 3. The number of conditions 
according to d is as listed in Table O 
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Fig. 2. Shapes of patterns acquired by Fixed Algorithm. 



7 ID of training datum is both game ID and move number. Training data of the same 
game but different moves have different IDs. 
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Table 2. Then number of conditions in a rule. 



d 



0 12 3 



The number of conditions 8 20 36 56 



Patterns are acquired by a simple mechanism. When a training datum is 
given, a pattern for each d , whose center is the training move, is simply stored 
into a database. A total four patterns (d = 0, 1, 2, 3) are stored. Eight rotations 
are considered and only one of them is stored and the others will not be stored. 

When the patterns are used (for refinement explained in Section ^ or for 
solving Tsume-Go explained in Section 0, only the pattern with the largest d is 
used, which is considered as “fed” in Subsection 0. For example, if two patterns 
with d = 0 and d = 2 match a board configuration, only the pattern with d = 2 
is considered as “fed”. 

2.3 Semi-fixed Algorithm 

The third algorithm acquires semi- fixed patterns. Patterns acquired by this al- 
gorithm are almost the same as those acquired by the Fixed Algorithm. The 
difference is that it mainly considers empty squares. In our cognitive experi- 
ment using an eye mark recorder, experts mainly solve Tsume-Go problems 
by seeing empty squares and stones adjacent to the empty squares. This mech- 
anism is introduced in the Semi-Fixed Algorithm. The Manhattan distance is 
how many steps you can walk vertically and horizontally from the center. In 
this algorithm you walk only on empty squares. That is, when you count d , only 
empty squares are counted and squares occupied by a stone are not counted. 
The counted squares and squares adjacent to the counted ones are considered in 
a rule. As a result, the number of objects in the IF-part varies. When there is 
no stone within d Manhattan distance, the rule is the same as that acquired by 
the Fixed Algorithm. An example is given in Figure 0 



Fig. 3. An example of rules acquired by Semi-Fixed Algorithm, d — 3. 




Mechanism of pattern acquisition is the same as that in the Fixed Algorithm. 
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3 Results of Pattern Acquisition 

3.1 Methods 

The three algorithms explained in Section Q were used to acquire patterns from 
Tsume-Go problems. The training examples were 1,039 Tsume-Go problems 
and their answers (total 3,993 moves, the average number of training data per 
problem is 3.8) 0 



Flexible Algorithm Training sets were chosen as follows. One problem was 
randomly chosen and all the correct moves of the problem were taken as training 
data from the first move to the last move. After all the moves were used as train- 
ing data, another problem was randomly chosen. This procedure was repeated. 
The time step is a single move in this simulation. The parameters are shown in 
Table O 



Table 3. Parameters used in acquiring Tsume-Go knowledge. 



Name of Parameters 


Value 


INI _ACT 


100 


FOOD 


1000 


threshold 


2000 


M (memory) 


2 or 3 


the number of iterations 


1,000,000 



Fixed and Semi-fixed Algorithm All training problems were used once. For 
each training datum, patterns with d, =0,1,2, and 3 were acquired. How many 
time patterns were matched was recorded and patterns that were matched more 
than once were stored. 

3.2 Results 

Flexible Algorithm Figure El lists examples of acquired rules. The number of 
acquired rules is listed in TableEl Some rules, called reliable re tea 0, were selected 
for future evaluation as follows. After the acquisition process finished, all training 
data were given once again as training data, and we checked how many times the 

8 Both White and Black moves are used as training data. Tsume-Go problems are like 
“next move problems” in chess. In the answer moves (training data), only a small 
number of moves are given but you have to think much deeper to answer a correct 
move. 

9 Rules fed over 100 times are called reliable rules. 
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acquired rules were fed. Rules fed over M (memory) times were selected, because 
rules fed less than M times will eventually die due to the memory mechanism. 
10% of the reliable rules were randomly selected to be evaluated by two human 
expert players 0 




Fig. 4. Examples of acquired rules of Tsume-Go. 



Table 4. The number of acquired rules from Tsume-Go. 



M 


No. of 


No. of 


No. of 




acquired rules 


reliable rules 


evaluated rules 


2 


896 


649 


65 


3 


885 


569 


57 



The results of the experts’ evaluation of the acquired rules are listed in Table 
El They show that the quality of acquired rules is very high. 



Table 5. The results of expert evaluation of Tsume-Go rules. 



expert 


M 


good 


average 


bad 


total 


A 


2 


21 (32%) 


29 (45%) 


15 (23%) 


65 (100%) 




3 


22 (39%) 


15 (26%) 


20 (35%) 


57 (100%) 


D 


2 


37 (57%) 


17 (26%) 


11 (17%) 


65 (100%) 




3 


35 (62%) 


19 (33%) 


3 ( 5%) 


57 (100%) 



Fixed and Semi-fixed Algorithm 1252 rules were acquired by the Fixed 
Algorithm and 1512 rules by the Semi-Fixed Algorithm. Since these rules are 
fixed and hard to evaluated by human experts, they were not evaluated. 



10 



Both are strong amateur players, 5 dan and 6 dan. 
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4 Rule Refinement Algorithm 

Our cognitive studies m show that the patterns which players stronger than 
2 dan have contain conditions that use Go terms and that those which 1 dan 
players have contain few such terms. We assume that 1 dan players use only a 
simple priority of patterns. Therefore, we assign numbers to rules in two ways. 
One is to assign importance or weight. The other is to use probability of accuracy. 
These algorithms are explained in this section. 



4.1 Assignment of Weights to Rules 

In this algorithm, weights, indicating how important the rules are, are automat- 
ically assigned to the rules acquired in the first step. 



Methods A weight is an integer. In the initial state, all rules are given the same 
weight, 200. When the value is larger that this, it takes more time to assign the 
weights. When the value is too small, most rules tend to have similar values and 
discrimination does not work properly. The initial value of weights, 200, was 
decided after some preliminary experiments. 

The points of Move m are calculated by summing all the weights of rules 
that are fed by Move m. After all the points of possible moves are calculated, 
the moves are ranked in descending order of points. A move that duplicates a 
training datum is called a correct move. 

Initially given weights are automatically adjusted in the following way. The 
value of one is subtracted from the weights of all the rules fed by moves ranked 
higher than the correct move, and the value of S/n is added to the weights of 
all the rules fed by the correct move, where S is the total subtracted values and 
n is the number of rules fed by the correct move. 

The adjustment of the weights does not change the total amount of the 
weights of all rules, because the total subtraction of weights is S, and the total 
addition is S/n x n = S. 

We use Table Qas an example to explain the adjustment. Each Movei is a 
candidate move, ranked as the i-tli rank by the summation of all the current 
weights of rules, R t j , which are fed by Movei. Numbers in the table indicate 
the weights of each rule. In this example the correct move is in the fourth rank; 
thus, all the weights of the rules fed by Move 1, 2, and 3, which are all negative 
examples, are decreased by one and the total subtraction is 8. The number of 
rules fed by the correct move is two, thus 8/2 = 4 is added to each rule. As a 
result, the correct move will be in the third rank. Repeating these procedures is 
expected to make the weights more appropriate. 

This realizes automatic adjustment of the degree of change because a larger 
amount is given to the rules fed by the move when the rank of the correct move 
is low, and a smaller amount is assigned when the rank of the correct move is 
high. 
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Table 6. Adjustment of priority values. 



candidate moves 


Priority values 
before adjustment 


Priority values 
after adjustment 


Movei 


Rn = 230, Hi2 — 210, H 13 = 180 


Rn = 229, R 12 — 209, R 13 = 179 


Move? 


R 21 = 200, R 22 = 180, R 23 — 170 


R 21 = 199, R 22 — 179, R 23 = 169 


Moves 


R 31 = 200, R 32 = 180 


R 31 — 199, R 32 — 179 


Move 4 Answer 


i?41 — 195, i?42 — 180 


R 41 = 199, R 42 — 184 



4.2 Rules with the Same Weights 

In order to confirm the effect of the weight assignment explained in Section 14. II 
Tsume-Go problems were solved with the same weights. 

Procedures for Solving Tsume-Go When the system is given a problem, it 
calculates points of every possible move in the manner shown below. The moves 
getting more points are considered more promising ones. The moves are ranked 
in descending order of points. The rank of the correct move is registered. 

Calculation of Points P m , the point score of Move to, is calculated as follows. 
Taking Move m as a training example, the number of matched rules to the 
example is denoted as NR m . The number of objects appearing in the condition 
parts of the matched rules (an object appearing in multiple rules is counted as 
one) is denoted as NO m . The point score of Move to, P m is calculated as follows: 

P m = 10 X NO m + NRm 

NR m ranges from 0 to 30 and its average is about 10. The rank of the move 
is thus determined mostly by NO m and minor adjustment is done by NR m . 
Take Figure 0 as an example of calculating the point score of a move. 




Fig. 5. Examples of calculation of points of moves. 



Suppose that the following two rules are matched to Move 1. 

if a A b then Move 1 
if b A c A d then Move 1 

NRi is 2, and IVOi is 4 (a, b, c and d), thus, Pi is 42. 
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4.3 Probability of Rule Accuracy 

This algorithm calculates the probability of the accuracy of each rule. When only 
the IF-part is matched and the THEN-part is not matched, it is called a “match”, 
which is a negative example. When both the IF-part and the THEN-part are 
matched, it is called a “hit”, which is a positive example. The probability of 
accuracy (p) for a rule, i, is calculated as follows: a* = No. of hit / No. of match. 

The probability of a move, to, (A m ) is calculated as follows: A m = 1 — 
niu(i- ai), where n is the number of hit rules. When n = 1, A m = a\. When 
Tsume-Go problems are solved, probability of all moves is calculated and the 
highest one is selected. 



5 Human Performance in Tsume-Go 

The performance of human players, described in this section, was determined to 
allow a comparison with that of the proposed system. Detailed procedures for 
the cognitive experiment are explained in [16I15| . 



5.1 Methods 

Subjects were given a Tsume-Go problem and replied with the first move of 
the solution within three or four seconds. 0 The subjects were allowed multiple 
replies. There were two kinds of experiments. In one, the problems were shown on 
paper for three seconds and the solutions were written on paper. In the other, 
the problems were shown on a computer display for four seconds and replies 
were made by mouse actions. El The results of the experiments using a mouse 
are marked below. The subjects were nine amateurs from 4 kyu to 6 dan 
as determined by standard tests. Three kinds of problems were used as test 
problems: basic problems jBj, problems for 3 dan 0, and problems for 5 dan E| . 
I 1 I Each kind of problem contained 100 problems, and a total of 300 problems 
were solved. 

The eye path of the subjects was recorded by an eye tracker. The records 
indicated that the subjects did not search during the experiments. Therefore, 
the performances of human subjects can be compared with those of a system 
which does not search. 



11 Solving Tsume-Go normally means showing a sequence of moves and the result 
such as White dead, Ko, or Black alive. In this experiment only the first move of the 
correct sequence of moves was requested. 

12 Using a mouse needs a bit more time than writing, so one second was added. Training 
in mouse usage was carried out before the experiment. 

13 The names of the problems, such as “basic”, “for 3 dan”, do not exactly represent 
the difficulty of the problems, and they are simply taken from the title of the books. 
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5.2 Results 

Table Q lists the correct rate of each subject. The correct rate is the sum of the 
reciprocals of the number of replies containing a correct answer B 



Table 7. Correct rate of Tsume-Go within a few seconds. (%) 



strength 


basic 


for 3 dan 


for 5 dan 


average 


6 dan(l)* 


63 


68 


43 


58.0 


6 dan(2)* 


53.5 


59.5 


38.3 


50.4 


4 dan 


58 


68 


38 


54.7 


3 dan 


27.0 


30.5 


21.0 


26.2 


1 dan(l) 


40.2 


36.8 


20.3 


32.4 


1 dan(2) 


30.8 


40.3 


22.3 


31.1 


2 kyu(l)* 


21.5 


18.3 


16.0 


18.6 


2 kyu(2) 


12.3 


15.8 


11.8 


13.3 


4 kyu 


13.5 


12.5 


10.8 


12.3 



The value of the correlation coefficient between the subjects’ strength B 
and the correct rate of Tsume-Go within a few seconds is 0.923, which is very 
high. The results show that the task of replying within a few seconds is directly 
related to the subjects’ strength and that the task is a proper one to evaluate 
the strength of players. They also show that these problems are rather difficult. 

6 Solving Tsume-Go 

6.1 Methods 

Patterns were acquired in the first step and then refined in the second step. 
Using the refined patterns, Tsume-Go problems were solved. We tested all six 
combinations of the three algorithms for the first step and two for the second 
step. 

The same 300 problems that presented to the human subjects in Subsection 
15 . 1 1 were used. These problems were not part of the training problems. 

6.2 Overall Results 

The correct rates of solving Tsume-Go for the six combinations are listed in 
Tabic 0 The results show that as the first step algorithm, Flexible Algorithm is 

14 When one of the three replies is a correct answer, l/3rd of a point was added to the 
correct rate. 6 dan (A) and 4 dan subjects always made one reply. The correct rate 
is thus the same as the number of correct replies. 

15 2 kyu is -1 dan, and 4 kyu is -3 dan. 
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much better than either Semi-Fixed Algorithm or Fixed Algorithm. Semi-Fixed 
Algorithm seems a bit better than Fixed Algorithm but the difference is not so 
clear. 

As for the second step, Probability Algorithm seems a bit better than Weights 
Algorithm with Fixed and Semi-Fixed Algorithms, but Weights Algorithm is 
better for Flexible Algorithm. This is a very interesting result. It is said that 
combinations of First step and Second step are very important. 



Table 8. Correct rates of solving Tsume-Go for six conditions. 







first step 






Flexible Semi-Fixed Fixed 


second 


weights 


31.0 


13.3 


11.0 


step 


probability 


25.0 


15.0 


13.3 



The time it takes to solve problems is very short, less than 10 seconds for 

... rm 

solving all the 300 problems for any combination using an Ultra Sparc Station I D l 



6.3 Results of Flexible Rules with Weights 



Using Flexible Algorithm and Weights Algorithm produces the best perfor- 
mance. In this section the result is shown in detail. Table ED lists the detailed 
results of solving Tsume-Go using the acquired rules with weights. The num- 
ber indicates the total percentage of correct moves for candidates. For example, 
solving Basic Tsume-Go, the correct percentage of the 1st rank answer of the 
system is 36%, and the correct percentage of the 1st and 2nd rank answers by 
the system is 51 %. 



Table 9. Correct percentage of solving Tsume-Go using acquired rules with 
weights. (%) 



Rank 


Basic 


3 dan 


5 dan 


Average 


1st 


36 (36) 


31 (31) 


26 (26) 


31 (31) 


2nd 


15 (51) 


26 (57) 


20 (46) 


20 (51) 


3rd 


12 (63) 


10 (67) 


9 (55) 


11 (62) 


4th 


10 (73) 


7 (74) 


14 (69) 


10 (72) 


5th 


6 (79) 


6 (80) 


8 (77) 


7 (79) 



16 



Almost the same speed as a Pentium II 400 MHz. 
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By comparing the average correct rate of the first-ranked move of the system 
(31%) and the average correct rates of human players (Table 0, the performance 
of this system almost equals that of 1 dan human players. 

In order to know the effect of weight assignment, Tsume-Go was solved by 
rules, all of which were assigned the same weight as explained in Section n~2i 
Table lists the results. 

By comparing the average correct rate of the first-rank move of the system 
(19%) and the average correct rates of human players (Tabled, the performance 
of this system almost equals that of a 2 kyu human player. 

Comparing these results with the previous experiments in Table 0shows that 
the performance is much improved by assigning weights to rules in the second 
step. 

Table 10. Results of solving Tsume-Go by Acquired Patterns. (%) 



Rank 


Basic 


3 


dan 


5 


dan 


Average 


1st 


20 (20) 


21 


(21) 


15 


(15) 


19 


(19) 


2nd 


14 (34) 


12 


(33) 


13 


(28) 


13 


(32) 


3rd 


8 (42) 


8 


(41) 


12 


(40) 


9 


(41) 


4th 


10 (52) 


9 


(50) 


9 


(49) 


9 


(50) 


5th 


10 (62) 


11 


(61) 


14 


(63) 


12 


(62) 



6.4 Performance for the Difference Search Space 

In order to see whether the performance of the system differs according to the 
size of search space, the 300 test problems were divided into two groups according 
to the number of candidate moves: the problems of one group had 10 or more 
candidate moves (165 problems) and those of the other group had less than 
10 candidate moves (135 problems). While the percentage of correct answers 
of the system for the former set was 29.1%, that for the latter set was 33.3%. 
This result shows that the system is effective in selecting the first move even if 
the problem has a lot of candidates. Hence this system will help search-based 
Tsume-Go solvers when the problems have many candidates, a situation which 
is very difficult for current search-based solvers. 



6.5 Effects of Selecting Candidates of the First Move 

In order to know how effective it is to select first move candidates in Tsume-Go, 
a preliminary experiment was carried out. For some problems which our system 
chooses the correct first move, the time of solving a problem by GoTools was 
compared to that of solving a position where the first move is added to the prob- 
lem. The result is that the time taken by GoTools is reduced by 80%. For one 
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problem, the reduction was 92.6% (from 25266.3 to 1870.8 seconds). This pre- 
liminary experiment implies the effectiveness of selecting first move candidates. 

7 Discussion 

7.1 Two-Step Learning 

We identified that two steps were needed to acquire patterns, and separated 
the step of acquiring patterns from the step of adding additional conditions or 
information to the patterns. The meaning of this is as follows. In existing systems, 
conflict resolution mechanisms are implemented by hand, which is very hard. The 
proposed two-step model includes learning mechanisms. Some advantages of this 
two-step model are explained below. 

One advantage is that more precise control is possible in this model. When 
conflicts are resolved without additional information to patterns (without refin- 
ing patterns) - for example complexity is the only conflict resolution mechanism 
available and the most complex rules are always winners among matched rules 
- this resolution approach is applied to all rules. On the other hand, in the two- 
step model, the conflict resolution mechanisms can control each rule differently. 
The experiments in this paper showed this advantage. When Tsume-Go prob- 
lems are solved by one general rule without refinement, the percentage of correct 
answers is about 19% (same as a 2 kyu human player). On the other hand, when 
weights are assigned to patterns (more precise control) , the percentage of correct 
answers becomes 31% (same as a 1 dan human player). 

Another advantage is that the steps are independent of each other and the 
effect of an algorithm in one step is easy to measure. This makes it easy to com- 
pare algorithms for the first step and those for the second step. This comparison 
was carried out in Section Q 

Two algorithms implemented in the model in this paper have the following 
features: the first algorithm requires only positive examples, whereas the second 
algorithm requires both positive and negative examples. An advantage in this 
implementation is a computational one. If both steps are calculated in one step, 
both positive examples and negative examples are needed. In Go, for a training 
move, the move is only a positive example, whereas all the other possible moves 
are negative examples, the number of which is about 200 on average. Thus, 
using negative examples is computationally much more expensive than using only 
positive examples. The first step uses only positive examples. After acquiring 
promising patterns in the first step, negative examples are used in the second 
step in this study. When the two steps are integrated in preliminary experiments, 
it takes too much time to acquire as many patterns as this study. This is because 
negative examples are used when a huge number of non-promising rules are 
evaluated. The experiments confirmed that two-step learning is a practical way 
of using negative examples. 

The performance recorded in solving Tsume-Go in this paper is not so im- 
portant. This model is a framework for developing and comparing algorithms for 
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each step. It is important that this model allows algorithms to be compared. The 
performance of solving Tsume-Go in this paper is the starting point. We need 
to develop better algorithms for this model and compare the results achieved in 
order to acquire better patterns. 

7.2 First Step 

Comparison with Other Algorithms In previous studies, knowledge that 
involved fixed sizes and shapes was acquired. For example, one system na ac- 
quires knowledge as shown in Figure 0 which almost duplicates the patterns 
acquired by Fixed Algorithm. As shown in Table 0 Flexible Algorithm, offers 
much better performance than Fixed Algorithm. 












-h- 


— 


T 


— b 











Fig. 6. Examples of knowledge acquired by Sei’s system na- 



Effects of the Number of Rules The number of patterns may affect their 
performance. When the number of rules increases, the performance may become 
better or worse when the number of rules is too large Lj The algorithm imple- 
mented as the first step in this paper can change the number of acquired rules 
by changing the parameter FOOD. Since the effect of the number of rules is 
easy to examine in this model, future work will be to solve Tsume-Go using a 
different number of rules. 

In this simulation, patterns with weights are acquired, which are expected 
to be the same knowledge representation as that of a 1 dan human player. The 
performance of this simulation is as good as that of a 1 dan human player. By 
increasing the number of rules until performance saturates, we will determine 
whether using patterns with weights can exceed the performance of a 1 dan level 
human player. 

1 ' When the number of rules increases, the time it takes to match rules will also increase. 
It is, however, not so expensive for computers nowadays; it takes a Pentium II 300 
MHz machine about 1 second to play a move using 15,000 rules for a 19 x 19 board 
game. If the performance becomes even worse after adding too many rules, you can 
improve the performance in many ways, such as by adding Go terms or purposes to 
rules. 
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7.3 Second Step 

This paper considered only two algorithms for the second step. Many other 
algorithms, such as Neural Networks, should be investigated. This is a future 
work. 

7.4 Usefulness of Patterns 

Patterns have been thought useful, but how and when they are useful has not 
been investigated. How patterns are useful is discussed in this section. 

Patterns may improve search performance when the search space is very large. 
The results in Subsection 16.41 show that patterns are useful in solving Tsume- 
Go even when the search space is large. The result in Subsection E3 shows 
that selecting first move candidates in Tsume-Go may improve the speed of the 
search techniques. These two results indicate that it is possible to conclude that 
patterns can improve search performance when the search space is very large. 
The correct rate must be improved further before search techniques can be put 
into actual use. We believe that this future work can be carried out using the 
proposed two-step model. 

Patterns may also be useful for human beginners. Table |5] shows that many 
patterns are evaluated to be good by human experts. Weights of rules indicate 
the importance of the rules. Therefore, acquired rules with large weights may 
be useful for human beginners. It may be a very interesting experiment to in- 
vestigate whether beginners can improve their performance by memorizing the 
patterns. 

Toward More Human-like Knowledge This simulation used patterns; that 
is, only stones and the edges of the board are used to describe rules. Cognitive 
studies on the human knowledge of Tsume-Go yi7flhj have revealed the following. 
A 2 kyu player has only rules of board configurations, which is the same as 
this system. A 3 dan player and a 6 dan player have “hybrid patterns” , whose 
conditions are described by Go terms. In order to improve the quality of rules, 
acquiring rules with Go terms seems to be a hopeful way. For example, a term 
representing the number of liberty of stones may be a good candidate. Since the 
first step algorithm is flexible, once Go terms are defined, rules with the terms 
can be acquired. This would be a very interesting future work. 

7.5 Application to Game Playing Systems 

We believe that the proposed two-step model can be applied to game playing sys- 
tems. Each algorithm in the first step and the second step should be reconsidered 
for application to the game playing systems. The first step may need major mod- 
ification. In actual games, the importance of language level descriptions, such as 
purpose and concept, may increase. On the other hand, the second step, conflict 
resolution step, may need only minor modification. We assume, however, that 
the model itself does not need to be changed. Note that this discussion becomes 
possible because of this two-step model. 
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8 Conclusions 

A two-step model of pattern acquisition was proposed in this paper based on 
our cognitive experiments. This model has advantages in short computational 
time over one-step models. Another advantage is that more precise control over 
conflict resolution is possible. 

Tsume-Go problems were solved using patterns acquired in this model. The 
usefulness of the patterns acquired by this model was also investigated. First, 
the performance of the system is as good as 1 dan human players. Second, 
patterns are effective even when the search space is large, thus patterns can 
improve search technique performance when the search space is very large. Third, 
acquired patterns with large weights may be useful for human beginners. 
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Abstract. Go is a difficult game to make a computer program because 
of the space complexity. Therefore, it is important to explore another 
approach that does not rely on search algorithms only. In this paper, we 
focus on tsume-go problems (local Go problems) that have a unique so- 
lution. A three-layer neural network program has been developed to find 
a solution at a given position of tsume-go problems, where the attacker is 
to kill the defender’s territory on a 9 x 9 board. The network consists of 
162 neurons for the input layer, 300 neurons for the middle layer, and 81 
neurons for the output layer. We let the network learn the current stone 
patterns and, hence, process a direct answer. The network learns 2000 
patterns of tsume-go by the back-propagation method. Within 500 re- 
peats, the network learns 2000 patterns correctly. We tested the network 
ability: the top three selected moves contain about 60% correct answers, 
and the top five, about 70% for unknown problems at 500 repeats of 
learning. We compare the rate of correct answers by the network with 
that of human players who replied in a few seconds only. The ability of 
the network is roughly equivalent to 1-dan strength of human player. 
Application of neural networks for a computer program of tsume-go (and 
also Go) combined with a pattern classifier might provide a prospective 
approach to create a strong Go-playing program. 

Keywords: neural network, tsume-go, back-propagation, unique solu- 
tion 



1 Introduction 

Computer technologies have been developed rapidly in recent years and the cal- 
culating ability of computers has surpassed in some respects that of human 
beings. It is widely known that the best computer chess program recently de- 
feated the world human champion. The superb ability of conventional computers 
has considerably contributed to this result. 

However, in some games like Go, computer programs are largely inferior to 
moderate human players. Currently the best Go program is still weaker than the 

* At present affiliated to the Satellite Venture Business Laboratory, 

Shizuoka University, 3-5-1 Johoku, Hamamatsu, 432-8561 Japan. 
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average human player. One of the main reason for this is that the search tree of 
a Go program is significantly larger than that of a chess program. 

In chess programs, an almost full-width search can be carried out. However, 
it is impossible to make a Go program to look ahead the entire tree because 
the size of the tree is extremely large. Therefore, it is important to explore an 
alternative approach that does not rely on searching only. 

Tsume-go is a local problem of Go for which the attacker tries to kill a 
group/territory of the defender in a given setting. Among tsume-go programs, 
the best computer program is supposed to be Wolf’s program(GoTools) 0. 
GoTools uses the conventional search algorithm, and is limited on the search 
ability. When stones of tsume-go problems are placed in a relatively large area, 
GoTools tends to spend a significantly long time to solve them; in some cases, it 
cannot solve the problem at all because the search tree is too large. Thus, even 
in tsume-go, we need to explore an approach other than the conventional search 
algorithm. In this paper we consider the application of neural networks as an 
alternative approach for tsume-go and Go-playing programming. 

It is known that neural networks have a high ability of pattern matching and 
generalization. If a neural network learns many patterns of the stones in Go, the 
network would automatically reply to any unknown input pattern. Therefore, the 
network can play an individual move by learning and generalization. In addition, 
the neural network can output the answers immediately for a given input pattern. 
The time the network spends simply depends on the size of networks, and it 
is minimal compare to the conventional search programs. Thus we have two 
advantages to use neural networks for Go and tsume-go programming: an ability 
to find a solution for unknown problems and a fast response. 

There are some studies on the application of neural networks for Go pro- 
gramming. Enzenberger applied a neural network for calculation of an evalua- 
tion function 0. In his program, the network uses the transformational features 
of stones like strings and groups of stones. Thus it does not evaluate the current 
input patterns directly. Furthermore, his program NeuroGo is much weaker than 
the conventional Go programs. The performance level of NeuroGo may corre- 
spond to the medium playing level 8 (out of 20) of the conventional program 
“The Many Faces of Go” 0 played on a 9 x 9 board. Richards et al. also used the 
evolving neural network for Go programming Their networks evolve by using 
how well they play the game, instead of individual moves. Thus the current use 
of neural networks does not focus on individual moves. 

In Go and tsume-go, however, individual moves highly depend on the current 
stone patterns. In Go, a position that needs a unique solution appears very often 
in “local” games and in the endgames. In tsume-go problems, usually there is 
only one solution. 

In this paper, we describe a neural network program for tsume-go. The net- 
work learns the current stone patterns and find good candidates. By learning 
the best moves in many tsume-go patterns, the network should be able to find 
a solution in tsume-go and the Go endgames. 
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2 The Application of Neural Networks 

2.1 The Feature of Neural Networks 

The neural network has been widely used in Artificial Intelligence research. Neu- 
rons of artificial neural network models have a nonlinear input-output character- 
istic like real neurons. We show the typical characteristics of a neuron in Figure 
Q Hereafter, we use the term “neuron” as a neuron of neural network models. 



.V 



(a) Neuron’s input and output 

y 




(b) Continuous type input-output characteristic of neuron(threshold = 0) 




Fig. 1. The typical characteristics of a neuron. 



Neurons have multiple inputs from others. Each input connection has a 
weight (u>i) that indicates the strength of the connection. It is allowed that Wi 
has a minus value. The product of input value Xi and weight Wi is the effective 
input of each connection. 

Neurons have two states, depending on the values of effective inputs to a 
neuron. Usually, the input-output characteristic of neuron is expressed as either 
a step function or a continuous function that have sharp gain near the threshold 
value (Figure [I] (b) and Equation dQ) We use a continuous function for the 

1 In general, the characteristic of biological neurons is expressed that they have two 
states. Though neurons of neural networks might have the middle-range value near 
the threshold input, we usually say that neurons have two states because of the 
analogy with biological neurons. 
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input-output characteristic of neurons in this paper. If the sum of effective input 
values is higher than the threshold value, the neuron is activated, and if the sum 
is lower, the neuron is inactivated. When the threshold value is 0, output value 
y is calculated by Equation [IJ 




^tanh WiXi'j + 1^ 



(1) 



In this paper, we use the back-propagation learning method, that is a popular 
supervised learning method for neural networks JQ. A multilayer feedforward 
neural network is used in the back-propagation learning. Let us show, in Figure 
Q an example of a three-layer neural network structure. 



Input Layer Middle Layer Output Layer 




Fig. 2. An example of a three layer neural network structure. 



Each connection has a weiglit('uy) that indicates the strength of connection. 
The network can reply the specified output patterns for the specified input pat- 
terns by adjusting Wj . Adjustment of wi means the learning of neural networks. 
The supervised data consists of problems for the input layer and answers for the 
output layer. When the problem data is input to input layer neurons, the network 
replies with signals from the output layer. We expect some error between the 
supervisor’s answer and the network’s answer. In the back-propagation method, 
Wi is updated following Equation |3 and the error values should be reduced by 
updating: 

. . dE 

Wi{t + 1J = Wi{t) - eps- — Xi. (2) 

OWi 

Here, E denotes the sum of squared error values between the supervisor’s 
answer and the network’s output, eps is the learning coefficient that determines 
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the speed of learning. As a result of repeated Wi updates, the network’s outputs 
approach to the supervisor’s values. 

Many patterns can be learned by a neural network. The number of patterns 
they can memorize depends on the structures and parameters of a network, e.g., 
the number of neurons and layers. Neural networks that have learned by the 
back-propagation method, have the abilities of classification and generalization. 
We use these abilities of neural networks to produce the next step in tsume-go 
problems. 

2.2 Structure of Our Program 

A three-layer feedforward neural network is used in the simulation. Each neuron 
in the former layer has a connection to all the neurons of the next layer. The 
structure of the neural network and its corresponding board position is shown 
in Figure 0 

Neurons of the input layer correspond to the location on the board in tsume- 
go problems. The board size is 9 x 9, about one-forth large of the standard Go 
board. Therefore, all tsume-go problems used in this paper is confined in the 9x9 
area (total: 81 lattice points). The number of necessary neurons in a network 
depends on the board size. Two neurons in the input layer are used for each 
lattice point on the board. One input neuron indicates whether or not to be a 
black stone at that position, and the other neuron indicates whether or not to 
be a white stone. Altogether 162 = 9x9x2 neurons are present in the input 
layer of the current network. When a stone is in a position, the input value of 
a neuron that corresponds to the position and the stone color is 1. When no 
stone is present in the position, the input values of the two neurons are 0. In the 
output layer, only one neuron is allocated to each board position in the current 
network. We only ask the answer for the next move. Therefore, the number of 
neurons in the output layer is 81 = 9 x 9. 

The number of middle layer neurons is an important factor that affects the 
learning ability of the network. Here, we set 300 neurons in the middle layer. We 
discuss the relationships between the number of middle layer neurons and the 
network’s ability of learning in section IT.’-il below. 

When stone positions are input into the neurons of the input layer, the output 
of the network produces an answer. We interpret that the location indicated by 
the highest activated neuron in the output layer is the move to select as an 
answer. 

2.3 Learning Procedures 

The neural network learns tsume-go patterns by the back-propagation method. 
All the problems that the network learns in the current work are classified 
as “Kurosen-Shiroshi” . Two players are distinguished as the first player (black 
stones) and the second player (white stones). Here, the first player is the attacker 
who is to kill the territory of the defender (the second player) . The problem is 
designed that white stones should be killed in the end if the attacker selects the 
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(a) The 9x9 board of Go 



Input Layer Middle Layer Output Layer 
(162) (300) (81) 




(b) Each input layer neuron indicates a position in question where the attacker is to 
move and a stone color, and each output layer neuron indicates a move of the 
attacker for the position considered. 



Fig. 3. The board position and the structure of the neural network. 
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optimal moves. We collected problems as a test set from several tsume-go prob- 
lem books (0 and others). The answer to a tsume-go problem often consists of 
several sequential moves. We used every move of the attacker in a problem as a 
single learning datum. 

It is generally known that the third move (i.e., the second move of the at- 
tacker) of tsume-go solution is easier than the first move; the fifth move is easier 
than the third and first moves; and so on. However, these problems contain var- 
ious levels of difficulties. The first move to a given problem can be much easier 
than the fifth move to another problem. Therefore, we ignored these differences 
of moves. Thus, all attacker’s moves in a problem are used as a data set. We 
collected the total of 3388 data of stone positions together with its solution, i.e, 
the attacker’s move. We prepared five different sets of learning data from these 
data (Data set A to E). Each data set consists of 2000 moves of the attacker for 
learning, and 1000 moves for testing as unknown problems. We built five net- 
works of the same structure for the five different data sets. Learning was done 
by the following procedure: 

1. Input all the stone locations of a position in the input layer. If there is a 
stone, we set a neuron that indicates this location and its stone color to 1, 
and all other neurons are set to 0. 

2. The only neuron activity level in the output layer that is recognized as a 
correct answer is set to 0.9, while those levels recognized as a wrong answer 
is set to 0.1. 

3. The weight of network connections are updated by the back-propagation 
method (one round). 

4. After the network experiences all 2000 patterns once, the network is then 
exposed to these patterns as a second time (2nd round). By updating (learn- 
ing) many times (rounds), the neuron activities should converge to either 0.9 
for a correct answer (only one neuron), or 0.1 for wrong answers (all the rest 
neurons) . 

3 Results 

3.1 Learning Results 

During and after learning, we examined the ability of the neural network to 
answer for “known” and “unknown” problems. “Known” problems are the ones 
used in learning (2000 patterns), and “unknown” problems are those not used 
in learning (1000 patterns). In the case of known problems, the rate of correct 
answers means the number of correct answers divided by the number of the total 
answers (2000). The answers by the network were evaluated in the following 
manner: 

1. We input the stone positions in the input layer and pick up the highest 
activated neuron in the output layer. This position is considered the move 
selected. 
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2. When a stone is already at the selected position, we pick up the next highest 
activated neuron for the answer, and so on. 

3. If the network’s answer corresponds to a correct answer, we judge that the 
network replied with a correct answer. 

Table [I] shows the numbers and percentages of correct answers of the neural 
networks. After learning a few hundred rounds, the network produces correct 
answers for all the problems. Figure 01 shows the rate of correct answer for one 
of 5 data set. Note that the horizontal axis, learning steps, is in the logarithmic 
scale. The rate of correct answer increases almost linearly and converges to 
the unity at around one hundred rounds. Thus our results demonstrate that 
the neural networks have an ability to learn at least a couple thousand stone 
movements correctly within a few hundred-round learning. 



Table 1. The rate of correct answers for known problems (2000 patterns) in a 
three-layer networks. 



Learning 

(Round) 


correct (%) 
data A 


B 


C 


D 


E 


0 


6(0.30) 


8(0.40) 


8(0.40) 


7(0.35) 


6(0.30) 


1 x 10 u 


147(7.35) 


105(5.25) 


101(5.05) 


87(4.35) 


77(3.85) 


5 x 10 u 


847(42.35) 


874(43.70) 


844(42.20) 


882(44.10) 


916(45.80) 


1 x 10 1 


1233(61.65) 


1265(63.25) 


1309(65.45) 


1290(64.50) 


1312(65.60) 


5 x 10 1 


1978(98.90) 


1970(98.50) 


1985(99.25) 


1974(98.70) 


1966(98.30) 


1 x ltF 


1995(99.75) 


1996(99.80) 


1997(99.85) 


1995(99.75) 


1995(99.75) 


5 x 1(F 


2000(100) 


2000(100) 


2000(100) 


2000(100) 


2000(100) 


1 x 10 a 


2000(100) 


2000(100) 


2000(100) 


2000(100) 


2000(100) 



3.2 Answers to Unknown Problems 

Here we examine the ability of our neural networks to answer 1000 unknown 
problems. We follow the same evaluation procedure as in the case of the known 
problems. The five highest activated neurons are used to select plausible candi- 
dates, instead of the only one highest activated neuron that is used for known 
problems. 

The numbers and percentages of correct answers at 500 learning steps are 
shown in Table E3 The rates of correct answer are similar among all data sets 
(data A to E). The cumulative rate of correct answer improves significantly up 
to the third answer. But the rate of improvements becomes less significant for 
the forth and fifth answers. These results demonstrate that the trained networks 
have some ability to choose correct moves for unknown patterns. 

It is also shown that the first few answers are highly likely to include the cor- 
rect answers. The first and two cumulative rates of correct answer for unknown 
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The Learning Step(Rounds) 



Fig. 4. The rate of correct answers of a network for known problems (data A). 



problems (1000 patterns) are shown for one of 5 data set (Figure 0 . The rate 
of correct answer increases almost linearly and at around a hundred rounds it 
converges to some level depending on the number of selected answers. After 500 
learning steps, the cumulative rate of the top 3 answers becomes approximately 
55-65%, and that of the top 5 answers, 65-75%. Thus a few hundred learning 
steps are enough for the network to achieve the best performance for unknown 
problems, as in the case of the known problems. 

These results suggested that the neural network acquired the characteristic 
pattern knowledge of tsume-go by learning some amounts of given patterns of 
tsume-go problems. The networks replied correct answers to a certain degree for 
unknown problems by classifying such characteristic patterns of movement. 

3.3 The Sizes of Network 

How many neurons are necessary and/or sufficient for the middle layer of the 
network to produce the correct answers for tsume-go problems? We examined the 
relationship between the number of middle layer neurons and the learning ability 
of the networks for tsume-go problems. The number of neurons in the middle 
layer is varied from 25 to 300. Right after 1000 rounds learning, we examined 
the answering ability of the networks for known problems (Figure 0. 

The results indicate that at least 100 or 150 neurons are necessary in the 
middle layer to learn 2000 patterns of tsume-go, and 150-200 neurons are suffi- 



176 Nobusuke Sasaki, Yasuji Sawada, and Jin Yoshimura 



Table 2. The rate of correct answers of networks for unknown problems (at 500 
learning rounds). 





correct(%) 
data A 


B 


C 


D 


E 


1st answer 
Cumulative 


338(33.8) 

338(33.8) 


377(37.7) 

377(37.7) 


325(32.5) 

325(32.5) 


339(33.9) 

339(33.9) 


365(36.5) 

365(36.5) 


2nd answer 
Cumulative 


161(16.1) 

499(49.9) 


160(16.0) 

537(53.7) 


139(13.9) 

464(46.4) 


137(13.7) 

476(47.6) 


147(14.7) 

512(51.2) 


3rd answer 
Cumulative 


95(9.5) 

594(59.4) 


108(10.8) 

645(64.5) 


87(8.7) 

551(55.1) 


80(8.0) 

556(55.6) 


95(9.5) 

607(60.7) 


4th answer 
Cumulative 


58(5.8) 

652(65.2) 


63(6.3) 

708(70.8) 


57(5.7) 

608(60.8) 


60(6.0) 

616(61.6) 


60(6.0) 

667(66.7) 


5th answer 
total 
mistake 


46(4.6) 

698(69.8) 

302(30.2) 


31(3.1) 

739(73.9) 

261(26.1) 


55(5.5) 

663(66.3) 

337(33.7) 


34(3.4) 

650(65.0) 

350(35.0) 


33(3.3) 

700(70.0) 

300(30.0) 




The Learning Step(Rounds) 



Fig. 5. The rate of correct answers of the network for unknown problems (data 
A). 
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dent. These results assures 300 neurons used in all the tests in this paper are 
sufficient to gain the best performance of the neural networks. 
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Fig. 6. The relationship between the number of middle-layer neurons and the 
learning ability (at 1000 learning rounds). 



4 Discussion 

4.1 The Comparison with Human Ability 

The ability of the neural networks is compared with that of human players. When 
human players are forced to answer in a short time, they seek answers from 
the impression of the stone patterns, because they do not have time to search 
and evaluate sequential moves. Similarly the neural network quickly reaches to 
answers based solely on stone patterns. Therefore, the network performance is 
comparable to that of human players in such a quick response, 

Yoshikawa and others studied the ability of human players by showing them 
tsume-go problems in a few seconds (Zj . Their experiment was carried out under 
the conditions and methods described below. 

Nine human players (subjects) saw a problem and answered a move to play 
in a few seconds. Problems were collected from three tsume-go books of three 
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differing levels of difficulty (basic, 3-dan and 5-dan) [B|-[1XJ- Each tsume-go book 
contains 100 problems; the total of 300 problems are used for the experiment. 
The players were allowed to reply multiple answers. If a player replied the only 
one answer and if it was correct, the player acquired one point. If the player 
replied N answers and if one of them was correct, the player acquired jt points. 
The rate of correct answers was defined as the acquired points divided by the 
number of problems. 

Table 0 is the rate of correct answers of human players [Zj . There is a clear 
relationship between the rate of correct answers and the skill of human players. 



Table 3. The rate of correct answers of 9 human players responding in a few 
seconds(%). All data are quoted from 0. 



the levels of difficulty 


basic 


for 3-dan 


for 5-dan 


average 


the number of problems 


100 


100 


100 




skills 

of 

players 


6-dan a 


63 


68 


43 


58.0 


6-dan b 


53.3 


59.5 


38.3 


50.4 


4- dan 


58 


68 


38 


54.7 


3-dan 


27.0 


30.5 


21.0 


26.2 


1-dan a 


40.2 


36.8 


20.3 


32.4 


1-dan b 


30.8 


40.3 


22.3 


31.1 


2-kyu a 


21.5 


18.3 


16.0 


18.6 


2-kyu b 


12.3 


15.8 


11.8 


13.3 


4-kyu 


13.5 


12.5 


10.8 


12.3 



The ability of the neural network is compared with the skill of human players 
based on the rate of correct answer in the following manner. It is preferable to 
use the exact same problems used for human players to test the network ability. 
However, the current network is set to learn only “Kurosen-Shiroshi” problems 
on a 9 x 9 board. Therefore, we selected the problems to be tested from all the 
problems tested for human skills. The number of the unknown problems used 
for the network tests was 61 for the book of basic, 43 for the book of 3-dan, and 
47 for the book of 5-dan. 

The rate of correct answers of the network after 500 learning rounds is shown 
in Table El Here “1st” means the network’s 1st answer. Similarly, 2nd, 3rd, 4th 
and 5th mean the network’s n-th answer. “Cumulative” of n-th is the sum of 1st 
to n-th. 

The network cannot decide how many answers to select. Therefore, we have to 
decide how many answers to choose. In “score 1”, the first answer is only used 
as a sole answer for each problem. In “score 2”, the first and second answers 
are used for each problem; the sum of the two scores are divided half. This is 
equivalent to the case when a human player responds with two answers for each 
problem. Both score 1 and score 2 are calculated by the same rule for calculating 
human scores in (Jj . 
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Therefore, we can roughly compare the data presented in Table 0and the 
scores 1 and 2 in Table 0 Based on the individual data and averages (see Table 0 
and 0, the ability of the current network is roughly equivalent to human 1-dan 
skill. 



Table 4. The rate of correct answers of the neural network(%). Score 1 and 
score 2 are calculated by the same rule in 0 (data A learning network). The 
weighted average of score 1 and score 2 for the 3 levels of difficulty are 31.1 and 
26.8, respectively. 





basic 


for 3-dan 


for 5-dan 


problems 


61 


43 


47 


1st 


41.0 


27.9 


21.3 


Cumulative 


41.0 


27.9 


21.3 


2nd 


19.7 


27.9 


21.3 


Cumulative 


60.7 


55.8 


42.6 


3rd 


3.3 


23.3 


12.8 


Cumulative 


63.9 


79.1 


55.3 


4th 


4.9 


2.3 


4.3 


Cumulative 


68.8 


81.4 


59.6 


5th 


6.6 


2.3 


2.1 


total 


75.4 


83.7 


61.7 


score 1 


41.0 


27.9 


21.3 


score 2 


30.3 


27.9 


21.3 



4.2 Future Improvements of a Neural Network for Go 

The previous simulation shows that the best three moves contain 55-65% correct 
answers while the best five moves contain 65-75% correct answers for unknown 
problems. To check whether these network answers are better than the random 
choices among most frequent answers in tsume-go problems (e.g., stone positions 
2-1 and 2-2), we calculated the percentage occurrence of three most frequent 
answers among all 3388 problems. The three most frequent answers are 7-8 
(equivalent to 2-1 from the current positioning), 5-8 (4-1), and 4-8 (5-1). The 
total percentage of the three answers is only about 25% that is inferior to the 
current network performances (55-65%). This indicates that the neural network 
has at least some ability to learn the stone patterns and to produce the correct 
moves. It shows clearly that the neural network acquires the classification of the 
stone patterns that can be used for unknown problems. Thus our results imply 
that the neural networks can provide the alternative approach to the traditional 
search algorithms with tsume-go programming. 

We still need to improve the rate of correct answers for unknown problems. 
In tsume-go problems, all the move must be correct to reach the correct solution 
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(move sequences). Here we studied a simple typical neural network to examine 
the possibility of neural networks for Go programming. As discussed below, there 
are a variety of ways to modify the current networks to improve the network 
performance. 

The learning target was set as 0.9 for the (one) correct answer and 0.1 for 
the others (wrong answers), respectively. These values are chosen to attain the 
clear distinction between the two, while achieving the best convergence (Figure 
4). When we set 1.0 and 0.0 for the correct and wrong answers, respectively, the 
rate of correct answers does not converge to 1 even after learning 10000 rounds; 
it reaches about 0.85. This is probably because 1.0 and 0.0 are bounded by one 
side that makes convergence extremely difficult. There should be the optimal 
combination of the two target values. 

The back-propagation method can be also replaced with other network learn- 
ing methods. We can also classify the types of moves specifically and use different 
networks for different types of moves. For example, the “Kurosen-Shiroshi” cate- 
gory used in this paper includes many types of moves, such as “hane,” “tsuke” and 
“hourikomi.” (Figure 7). If a network learns one type of move, the rate of cor- 
rect answers is known to rise for the same- type problems [TTj. Furthermore, the 
categories can be classified by a different network. 

Thus neural networks can be more complex and different from the current 
networks and there are some optimal conditions for an arbitrary network. 




(a) hane 




(b) tsuke (c) hourikomi 



Fig. 7. Examples of the types of moves. 



We can also build a hierarchical system of networks to produce good candi- 
dates. For example, we can build a two-layer network systems: a top network 
is used as a classifier of the type of moves, and many bottom networks work 
for a specific type of moves to produce an individual move (Figure 7). Another 
classifier can be added for the categories of problems on the top of the hierarchy. 
A classifier can be a conventional search algorithm instead of a neural network. 

Furthermore, the network jobs can be divided into many different ways: clas- 
sification by the size of patterns on the board, classification by the connectivity 
of stone groups, and division by more simple patterns. 
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In tsume-go problems where the attacker is to kill the defender’s territo- 
ry/group, even a single wrong choice is not allowed. Therefore, to build a full 
tsume-go processor program, every single move must be correct even for unknown 
problems. The performance of the current networks is relatively poor for this 
respect. However, as discussed above, by modifying the networks and combining 
several methods, we should be able to improve the performance of the networks. 



4.3 Building a Tsume-Go Processor Program 

Once we achieve a higher rate of correct answers for each move, we can build a 
whole tsume-go program. The neural networks described in this paper answers 
next move only. Tsume-go solver should answer sequential moves to “solve” a 
given problem. There would be at least the following two types of algorithm for 
building a tsume-go solver. 

One type may consist of only neural networks. In the current simulation, 
we applied neural networks for only generating a candidate(s) of the attacker 
in tsume-go problems, and the networks learned the attacker’s moves only. But 
neural networks can also learn the moves of the defender. We can build a pair 
of special networks to generate sequential moves: a network for moves of the 
attacker and the defender. 

Another type may be a combination of neural networks and conventional 
search algorithms. At first, the neural networks pick up some candidates, and 
next a search algorithm preferentially searches the game trees of the candidates. 
The combined program of neural networks and search algorithms is likely to 
reach the correct answers faster than the search algorithm only. 



5 Conclusions 

In the endgames of Go and tsume-go problems, the position that needs a unique 
solution appears very often. We applied neural networks for solving tsume-go 
problems on a 9 x 9 board. After learning a few hundred times, a neural network 
can answer all known problems perfectly. The network also produces correct 
answers for unknown problems to some extent. Our result indicates that a neural 
network has an ability to produce next moves in tsume-go problems. The current 
neural network program may be used as a component of strong tsume-go and 
Go programs. 
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Abstract. The game of checkers can be played by machines running 
either heuristic search algorithms or complex decision making programs 
trained using machine learning techniques. The first approach has been 
used with remarkable success. The latter approach yielded encouraging 
results in the past, but later results were not so useful, partly because 
of the limitations of current machine learning algorithms. The focus of 
this work is the study of techniques for distributed decision making and 
learning by Multi- Agent DEcision Systems (MADES), by means of their 
application to the development of a checkers playing program. In this 
paper, we propose a new architecture for knowledge based systems ded- 
icated to checkers playing. Our aim is to show how the combination 
of several known models for checkers playing can be integrated into a 
MADES, that learns how to combine individual decisions, so that the 
MADES plays better than any of them, without “a priori” knowledge 
of the quality or area of expertise of each model. In our MADES, we 
integrate well known search algorithms along standard machine learning 
algorithms. We present results that clearly show that the team as a single 
entity plays better than any of its components working in isolation. 



1 Introduction 

Computer programs for checkers play have been traditionally built using quite 
different approaches and paradigms. Heuristic search combined with database 
lookups has yielded impressive results US), while machine learning algorithms 
have fared poorly. As it is stated in JTS], some methods like genetic algorithms, 
neural nets and function optimization have been tried for the task of learning 
to classify checkers situations (as either win, loss, or draw for a given color), 
but were discarded because of unacceptably high error rates. The authors of 
the present paper have experimented with several machine learning paradigms 
(ID3 |jfj, C4.5 (3, bayesian learning ED, and backpropagation EDI) and have 
obtained similar results. In sectional we discuss this issue and report on our own 
experience. 

On the one hand, heuristic search combined with database lookup requires 
the use of huge resources and takes a limited advantage of available knowl- 
edge, but performs satisfactorily. On the other hand, machine learning programs 
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should provide a satisfactory solution to a problem that is full of learning oppor- 
tunities, but they fail to do so in the general setting given the huge hypothesis 
spaces. Since we believe that you can take advantage of both approaches, we pro- 
pose to integrate them in a distributed checkers playing system. With this aim, 
we built autonomous decision making and learning systems for playing checkers, 
based on heuristic search and different machine learning paradigms. Each one 
of these systems is built as an autonomous agent using a single paradigm, and 
is able to play checkers. We have organized them as a Multi Agent DEcision 
System (MADES) 0]. 

The MADES decides which move to make on the checkers board by means 
of a distributed decision making procedure as explained in section [3- The idea 
of building a distributed decision making and learning system to play checkers 
is based on the following belief: given appropriate conditions, a group of agents 
forming a MADES is expected to play better than any of them playing in iso- 
lation. The aforementioned conditions were discussed at length in ;,‘3J by the 
authors, in a general context. The purpose of integrating individual monolithic 
systems into a MADES is to obtain a team performance unattainable when the 
individual systems work in isolation. We believe this to be an issue of paramount 
importance since it provides a performance enhancement mechanism (again, only 
when certain conditions are met). 

A similar approach was used by Epstein |2J. In that case, she built a set 
of game-independent Advisors, some of them could also learn using different 
learning techniques. Her system had a meta-theory on how to play independent 
of the actual game that was been played. In our case, we differentiate between 
the agents that propose a decision (move in the case of game playing) and the 
advisors that decide on which agent is more appropriate for that decision. This 
allows us to learn two different concepts: how to make a decision, and how to give 
credit to someone making a decision. Also, her meta-theory depends very much 
on the game playing paradigm, while our architecture is domain-independent Q 
Finally, her advisors did not collaborate for making decisions, while our agents 
are allowed to ask for advice to the rest. 

In section 0 we explain the experiments we have carried out and give the 
results obtained. We evaluate these results and draw some conclusions in sec- 
tion O 



2 Machine Learning and the Game of Checkers 

The game of checkers, like most games, is full of learning opportunities for ma- 
chine learning systems. The pioneering work of Arthur Samuel [ITT] , demonstrated 
the use of two learning mechanisms which noticeably improved the behaviour 
of his checkers program. The learning mechanisms he used in his program were 
very primitive, compared to the range of machine learning formalisms available 
nowadays. Nonetheless, these formalisms have not yet provided a satisfactory 

1 We are currently applying it to a hard induction problem, with very encouraging 
results. 
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machine learning solution to checkers. The focus of Samuel’s work was the study 
of machine learning techniques in the context of checkers, so he resisted the temp- 
tation of hardwiring expert knowledge into his program, because he insisted in 
letting the program discover that knowledge by itself. 

Supervised machine learning paradigms can be used to build game playing 
programs 0 . These learning systems use past play experience, and create a sum- 
marised representation of it, that forms the basis for decision making systems, 
that can make the decision of which move to make. Past play experience is con- 
tained in a training set, usually as a series of board descriptions, each followed 
by the final outcome (class). Supervised machine learning paradigms try to find 
common patterns in the boards that belong to the same class, and the collection 
of patterns encountered is used to build a class membership criterium that is 
used to classify (possibly) unseen checkers situations. 

The authors have built for their experiments four different supervised learn- 
ing systems, based on the following paradigms: ID3, C4.5, bayesian learning and 
backpropagation. The training set used was a subset of 29000 randomly chosen 
elements from the DB5 database HZj. The target concepts were win, draw and 
loss for white (white to move in all the situations). For selecting which move 
to make next, the successors of the current situation are presented to these pro- 
grams. The program classifies every successor and gives it a score; the successor 
that scores highest is the preferred one, and indicates which move to make next. 
The situations of DB5 are endgames of 5 pieces at most, so the four systems 
were trained with endgames of 5 pieces or less0 Nonetheless, since we expected 
to obtain a powerful generalization as a result of the inductive nature of the 
algorithms involved, we tested the four systems with endgames of 8 pieces or 
less. 

The four programs showed a very selective performance: a given program of 
the four may play certain endgames very well, but it plays others poorly (the set 
of checkers problems that an agent plays satisfactorily is known as its competence 
region). Moreover, the endgames that wereplayed well by one program did not 
coincide with those played well by anothery except for, as one would expect, in 
the case of ID3 and C4.5, given that C4.5 uses the same basic techniques as ID3. 
Since C4.5 handled well most of the situations that ID3 handled well, plus many 
others, we stopped using ID3 and used C4.5 instead. 

With the aim of improving the overall performance of the programs, a sec- 
ond training set with 100,000 situations, randomly taken from DB5, was built. 
The four systems were trained with this new training set. Surprisingly, the back- 
propagation system performed worse with this second training set. We modified 
the topology of the neural network with the purpose of making more expressive 
power available for internal representation, but none of the enlarged nets per- 

2 Examples were randomly selected from the database. In case the database examples 
have any kind of bias towards a specific type of position, this does not affect to our 
goal; our aim is not to build the best machine learning system out of that data, but 
to learn how to better combine it with other systems. 

3 The reasons of this behaviour are explained in (3 . 
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formed any better than the one trained with the first training set; the neural 
network did not scale up well. On the other hand, the Bayes and C4.5 systems 
improved remarkably, but still showed the same highly selective behaviour. 

None of the four systems that were tested performed satisfactorily. We be- 
lieve that this is because of the effect of the representation formalism used to 
represent the checkers situations in the training set. The use of an inadequate 
representation formalism can cause an unacceptable error rate. Sometimes this 
is due to the difficulty of expressing the target concept in terms of adequate 
input attributes or combinations of them. The use of meaningful intermediate 
concepts, with a more direct relation to the target concept, can alleviate this 
problem. We will illustrate this point with an example. Suppose that having 
one more king is to some extent determinant in some situations. Using the raw 
board description, the machine learning program will only notice that having 
some men in certain locations leads to some advantage (because of the existence 
of a crowning chance). To identify the boards that lead to that advantage, raw 
features will have to be combined, and many such combinations will have to be 
remembered, and in some way related to that advantage. The feature combina- 
tions that are thus grouped will be quite dissimilar. Now, let us imagine a higher 
level description for the checkers situations, using intermediate concepts (e.g. 
crowning chance in next move, crowning chance in n moves, capture chance in n 
moves, dominance of the center, victory chance in n moves, and so on), besides 
a raw description. The prior series of combinations of raw features is expressed 
now more easily, because more descriptive features are being used. This means 
that we are making the work easier for the machine learning program, because 
the common pattern is now expressed in a simpler way (that involves less fea- 
tures of the checkers situation description, combined more simply). This could 
be achieved by careful hand writing of the input features, or by use of automatic 
methods, such as constructive induction |2J. 

Other learning approaches applied to game playing have ranged from chunk- 
ing in chess P, temporal differences in backgammon P3, or bayesian learning 
of evaluation functions in Othello ■ A similar multi-agent approach applied to 
game playing was followed by Wiering by learning game evaluation func- 
tions using hierarchical neural networks architectures. In his case, all the agents 
implement the same paradigm (they are all neural networks). All the expert 
networks used by Wiering are equally suitable for being specialized to deal with 
any subset of the domain, as opposed to MADES hybrid approaches like ours, 
where some agents are more likely to correctly classify some situations, given 
that the paradigm they implement is better suited for that task. 

3 Multi Agent Decision Systems for Checkers Play 

In this section we describe how the overall architecture works as a problem 
solving (decision making) and learning model. 
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Fig. 1. The Intelligent Agents Organisation. 



3.1 The Composition of a MADES 

The Intelligent Agents Organization is a model composed by multiple intelligent 
heterogeneous agents that cooperate to attain a common overall goal. The IAO 
structure, is shown in Figure [D 

— One agent, known as the referee, is in charge of the overall system con- 
trol. It broadcasts problem instance descriptions (in our application they 
are checkers situations), and control signals to the rest of the team. It then 
receives the respective replies from the rest of the agents. These replies may 
be either advice, or problem solving proposals (move proposals in our ap- 
plication). The services the referee may request to an agent are: solution 
proposal synthesis (only to worker agents), execution of a learning session 
(if the agent has learning capabilities), and advice request (only to selected 
agents). These service requests are scheduled in a way that maximizes par- 
alle llism (every agent runs on a different machine), so the MADES response 
time is minimized. 

— The worker agents receive problem descriptions (checkers situations) from 
either the referee, or another worker, and reply with solution proposals. They 
work in parallel on a solution proposal to the same problem instance, are 
capable of autonomous decision making, and, some of them, have learning 
capabilities. Any of them could be the basis of a monolithic system aimed 
to solve each problem. The MADES should learn how to organize these 
worker agents to obtain a joint performance superior to the one that would 
be obtained in case we built a monolithic system with just one of the worker 
agents. The learning mechanism that accomplishes this task is distributed 
reinforcement learning of workers competencies |5]. 

— Several agents may play the role of advisors. They are contacted by other 
agents that wish to know who is the worker that is expected to handle best 
a given problem instance. The advisor replies with the identification of the 
worker that is expected to solve best the problem instance. The advice is used 
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by the referee as an aid for conflict resolution 0 and it is also used by workers 
who wish to know which worker is the most appropriate for collaborating in 
the solution of a problem instance. 

— A trainer agent produces problem instances that are used for training and 
testing. The criteria for problem synthesis affects the success of the learning 
effort. We are currently working on procedures to determine how to produce 
problems that speedup learning, and to force the learning of knowledge to 
handle the worst solved problem instances. 

3.2 IAO Decision Making 

When a problem instance arrives at the referee, it consults the advisors to deter- 
mine whether any worker agent is expected to solve that instance of the problem 
satisfactorily. In that case, the proposal that this agent provides will be given a 
privileged status when it has to compete with the proposals of its fellow workers. 
The problem instance description is broadcast to the workers, so that they can 
work on it, and reply with a solution proposal. One advantage IAO presents, is 
that the advisors and the workers work in parallel, so the IAO response time 
is a very small overhead longer than the one that would be obtained from a 
monolithic system built from the most time consuming IAO worker. 

When the referee receives the proposals of all the worker agents, and the 
advice from its advisors, it has to decide which proposal to use (most of the 
times this proposals will be incompatible and contradictory). The referee uses 
a poll mechanism for conflict resolution: the proposal that gets the greatest 
support is the one the referee will follow. The advisors’ candidates receive extra 
votes in this poll, so they have some advantage over less credited workers. 

One of the problems we perceived in previous experiments was that when the 
number of classifier workers was greater than the number of searcher workers, the 
system biased towards decisions made by classifiers, producing undesired results. 
So, we defined an automatic weighting mechanism that equals the maximum 
number of votes attainable by classifiers, and the maximum number of votes 
attainable by searchers. 



3.3 Learning in IAO 

Two different kinds of learning take place in IAO. First, the autonomous worker 
agents with learning capabilities can learn on their own about how to do their 
respective work. This is usually called centralized learning (E). And, second, the 
advisors learn the workers competencies. This is a form of distributed learning. 
Centralized learning deals with knowledge about solutions, that will permit a 
worker agent to solve the problems it is presented, so it can be carried out locally 
by the agent, isolated from the rest of the team. Conversely, distributed learning 
of agents competencies requires the use of global information, because it is based 



4 



For instance, when the agents disagree about which move should be made, the referee 
uses this advice to decide about which alternative to take. 
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on distributed credit assignment, that analyzes the performance of the MADES 
as a whole and of the workers individually, with the goal to learn competencies. 
This kind of learning will be used to make the synergetic effect possible. 

The centralized learning algorithms are the same ones used in monolithic 
systems. Distributed learning is actually what will allow the team of agents to 
perform better than the individuals on their own. In this process, the advisors 
analyze how satisfactory the solution the MAS produced is. 

We have designed an algorithm to learn workers competencies under the 
following hypothesis: if a worker is the most competent in the solution of a 
certain problem, it is also expected to be the most competent in the solution of 
another problem of the same difficulty and appearance. If the problem space is 
partitioned in subsets that contain similar problem instances, the competence 
data known for a certain problem instance, is expected to be also valid for the 
rest of the problem instances lying in the same subset of the partition. This is 
a generalization mechanism whose success depends on the similarity measure 
used. 

How many subsets are used, and what the similarity metric is, depend on 
the kind of problem. The goal of this partition is to enclose, in a single subset, 
problem instances for which any worker agent deserves the same credit. The in- 
tended learning will be as reliable as the degree of fulfillment to this requirement. 
A reinforcement table is associated to every subset. In such a table, a reinforce- 
ment is associated to every worker agent, whose meaning is how adequate is 
the worker agent for the solution of problem instances lying in the subset. This 
tables are used by a reinforcement advisor agent: once a problem instance ar- 
rives to it, it locates the subset of the partition the problem belongs to, and the 
associated reinforcement table. Then, it determines who is the most adequate 
worker according to the table. If such a worker exists, the advisor replies to the 
referee with the worker’s identification. Otherwise, it informs the referee about 
the lack of discerning data. As a result of “a posteriori” problem solving episode 
analysis, the participation of workers in the solution is determined, and they 
are consequently reinforced. The analysis and the reinforcement learning effort 
are carried out by the reinforcement advisor (the referee has been collecting and 
preparing data for this process during the problem solving episode). 

4 Implementation and Experiments 

To evaluate the I AO model, we have built a MADES composed of 8 agents (see 
Figure |2I): 

— The referee agent. 

— An advisor agent, known as reinfAG, that builds and consults the appro- 
priate reinforcement table, in order to advise other agents about the agent 
that is expected to solve most satisfactorily a given problem instance. 

— The C4.5AG agent, based on Quinlan’s C4.5 running in C trained on 
100,000 instances randomly taken from the Schaeffer’s DB5 database of 
checkers endgames m ■ 
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Fig. 2. Architecture of the checkers Multi- Agent System. 



— backpropAG, a connectionist worker agent. We have built a neural net- 
work that learns by means of the backpropagation with momentum learning 
algorithm mu. It has been trained with 29,000 examples taken from DB5. 

— bayesAG, a bayesian classifier trained on the same 100,000 instances used 
by C4.5AG, running in C. 

— alphaAG, an alpha-beta based worker agent with decision making capabil- 
ities only. Search has been constrained, so that the maximum search depth 
is limited to 5, and the maximum number of moves that the move generator 
outputs is 12. The purpose of this severe search constraint is to impose a 
time limit per move, brief enough to make possible the execution of many 
experiments 0 and long enough to yield interesting play. Again, the main goal 
of this research (for now) is to learn how to combine several agents (strong 
or not), but not to build the best ever player. 

The knowledge this agent uses is hardwired into its evaluation function and 
into its move generator (the moves believed to be most interesting are gen- 
erated first). A simple evaluation function has been used, so that the time it 
takes to compute it allows to compute it many times. The computation of 
the evaluation function evaluates the material difference between both sides, 
weighing the pieces with an amount that reflects the importance of the board 
area it dominates. This naive evaluation function provides reasonably good 
play in most common situations. 

hybrid AG, a heuristic searcher, based on alpha-beta. When a search tree 
leaf is reached, this worker asks reinfAG who is the worker that is expected 
to handle best this leaf situation. In case that worker is available, hybridAG 
requests from it the evaluation of the leaf node. If that worker is not available, 
hybridAG performs the evaluation of the leaf node locally. Notice that this 



5 Currently the average speed of problems played is around 100 per day, when the 
MADES runs using 3 Linux PCs, 1 SPI 4MP, and 2 HP Apollo workstations, shared 
with other users and applications. The availability of more powerful hardware would 
make it possible to reach selective search depths in accordance to competition pro- 
grams. 
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is a loosely coupled hybrid system, and that the searcher will be coupled 
with different classifiers at different moments. 

— A trainer agent that produces problem instances that are used for training 
and testing. These problems are produced in a balanced fashion: there are as 
many situations that are wins (or losses) for white as there are for black. This 
has been accomplished by: first, a checkers situation is randomly generated 
according to some restrictions (e.g. the total amount of pieces must be equal 
or less than a given constant); then, its inverted form is computed. If the 
first situation was a win for black, the next situation produced will be its 
inversion, i.e. a win for white. So, none of the sides is favored by the trainer. 

The agents communicate using the TCP/IP protocol over the Internet. Since 
the computers reside in the same network segment, the communication process 
takes much less time than the time local servicing of requests takes. In case 
computers in very distant locations were used, the network slowness in the heavy 
traffic hours would need to be considered. 

For training the MADES, 16,000 checkers problems were generated by the 
trainer agent. The checkers problems were played until either one side wins, or 
a draw is reached. The draw criterium we used is to test when a series of moves 
was being cyclicly repeated. 

The MADES played these problems against alphaAG, and a learning session 
was executed after every problem was played. Since our aim is to prove that the 
MADES can learn to make decisions in such a way that it beats any of its worker 
agents playing on its own, the test games were played between the MADES and 
each of its workers in turn. A set of 100 test problems was produced by the 
trainer, and this set was used in all the matches. The following results were 
obtained: 



opponent 


MADES advantage 


C4.5AG 


42% 


backpropAG 


35% 


bayesAG 


34% 


hybridAG 


4% 


alphaAG 


4% 



The MADES advantage is computed as the difference between the number 
of games won by the MADES, minus the number of games won by its opponent, 
divided by the number of games played, and multiplied by 100 to obtain a per- 
centage. This equals to expressing the game equity as a percentage. We express 
this calculation mathematically in the following formula: 

adv{G ) = MO-‘(g)lxi°° 
lkl 

where G is the set of games played, \G\ is the number of played games, w(G) 
is the number of games won by the MADES, and 1(G) is the number of games 
lost by the MADES. Since we played 100 games per match, the formula above is 
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simplified in this case to the evaluation of the difference between the games won 
by the MADES and the games won by its opponent. Further experiments with 
matches consisting of 24 and 50 games, closely approximated the results shown 
here. 

The results show that the MADES beats any of its members. We believe 
that this gain in the quality of play justifies by itself the construction of the 
MADES from the standalone systems. We are currently in the MADES training 
stage, where, after every training game, reinforcement tables are updated. Since 
the reinforcement tables have not yet converged, we expect that the results will 
improve as the tables get near their convergence values. 

The flexibility of the IAO model allows the replacement of a worker by an- 
other, and the adaption of the rest of the system to the new MADES composi- 
tion, thanks to the adaptive behaviour of the advisor. We replaced two workers 
during the MADES’s lifetime; the first replacement improved the MADES’ score 
in 6.8%, and the second in 4.68%. 

5 Discussion 

What the results show might seem obvious to anyone: an isolated system should 
be beaten (or drawn) by a team formed by itself plus other agents. What is not 
obvious at all is to determine under which conditions this is feasible, and to learn 
to control the system in a way that assures inter-agent cooperation and prevents 
inter-agent hampering. It should be noted that there is no “a priori” clue about 
how the individual systems should be combined, and that the MADES learns 
how to perform this combination on its own. This is the main contribution of 
the present work. 

Initially, the competence and preference regions of the worker agents are 
unknown. They are learnt by the system, and this information is used to influence 
the way the MADES makes decisions. This is quite dissimilar to putting together 
a program with good openings, a program with good middlegame play and 
a program with good endgame play. Because, in the latter case, one has “a 
priori” knowledge of the individual systems capabilities, while, in our model, the 
individual capabilities are not known “a priori”, so the MADES has to learn 
them. This is very common in machine learning: when one builds a decision 
system based on the output of a machine learning paradigm, one usually cannot 
foresee which instances will be either correctly or incorrectly classified (i.e. one 
lacks “a priori” knowledge about the competence region of the system, except 
for some very coarse grain guesses based on the composition of the training 
set). So if computer checkers is to take advantage of decision systems based on 
multiple autonomous learning (or not) agents, a method for learning dynamic 
competencies like the one we propose here is needed, because there will not be 
available “a priori” knowledge in the general case for determining which agents 
should be heeded at a given timeQ 



There will be no informed nor reliable way to combine individual decisions to reach 
a common overall decision. 
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The focus of this work is the study of techniques for distributed decision 
making and learning in MADES, and their application to the construction of 
a checkers playing program. We are aware that the checkers system presented 
here does not take advantage of the latest heuristic search enhancements, as 
championship level programs do. Moreover, we restricted ourselves to checkers 
problems of 8 pieces at most in our experiments [] The assesment of the results 
reported here provides an experimental background to support the adequacy 
of the theoretical IAO model. Now that we have tested the adequacy of the 
techniques involved, we wish to start working on producing a championship 
level program, enlarging our scope to the whole game of checkers, and refining 
and improving the existing agents. 

The results improve as learning progresses due to two effects. On the one 
hand, the more extensively the reinforcement tables cover the domain, the more 
information that is available for reinfAG. We are in this first stage currently, and 
the domain is not completely covered yet. This means that reinfAG can not give 
advice in some situations because there is no reinforcement table corresponding 
to the subsets those situations belong to. For the MADES to be mature, the 
whole domain must be covered, and the reinforcement tables must get near its 
convergence values. 

On the other hand, the convergence of the reinforcement tables is a sec- 
ond learning stage that will primarily take place after the covering is done. We 
believe the MADES has not even passed the covering stage, because new re- 
inforcement tables are often created during learning sessions. We expect the 
results to improve when the problem domain be totally covered with the union 
of the competence regions of the worker agents, and the reinforcement tables 
have approached their convergence values. 
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Abstract. In this paper a theory of game tree algorithms is presented, 
entirely based upon the concept of a solution tree. Two types of solution 
trees are distinguished: max and min trees. Every game tree algorithm 
tries to prune as many nodes as possible from the game tree. A cut-off 
criterion in terms of solution trees will be formulated, which can be used 
to eliminate nodes from the search without affecting the result. Further, 
we show that any algorithm actually constructs a superposition of a max 
and a min solution tree. Finally, we will see how solution trees and the 
related cutoff criterion are applied in major game tree algorithms like 
alphabeta and MTD. 

Keywords: Game tree search, Minimax search, Solution trees, Alpha- 
beta, SSS*, MTD. 



1 Introduction 

A game tree models the behavior of a two-player game. Each node n in such a 
tree represents a position in a game. An example of a game tree with game values 
is found in Figure H The players are called Max and Min. Max is moving from 
the square nodes, Min from the circle nodes. The game value f(p) for a position 
p may be defined as the guaranteed pay-off for Max. This function obeys the 
minimax property. An algorithm computing the guaranteed pay-off in a node 
n is called a game tree algorithm. Over the years many algorithms have been 
designed. Every algorithm tries to eliminate as many nodes as possible from the 
game tree search. So every algorithm has its own cut-off criterion. We will design 
a cut-off criterion derived from a theory of game trees. In this theory the notion 
of a solution tree is the key notion, which turns out to be a powerful tool for 
establishing such a criterion. We show that, in the family of search algorithms 
obeying the cut-off criterion, alphabeta is the depth-first instance, whereas MT- 
SSS is the best-first instance. Besides, we show in an obvious way that every 
algorithm necessarily builds a critical tree. 

This paper is organized as follows. In Section 2 we recall some facts on solu- 
tion trees mentioned earlier by Stockman 9 . In Section 3 the notion of a search 
tree is recalled. This notion has been introduced by Ibaraki 3|. Next, minimax 
functions on a search tree are defined, and the role of solution trees in a search 
tree is discussed. Section 4 presents a general theory on game tree algorithms 
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Fig. 1. A game tree with /-values. 



based upon solution trees. A general cut-off criterion is the most important 
result. Section 5 connects two well-known game tree algorithms to the cut-off 
criterion. The reader is referred to |Zj for details on this connection. 

To conclude the opening section some preliminaries are given. A game tree is 
denoted by G and its root is denoted by r throughout this paper. Given a state- 
ment related to a game tree, replacing the terms max/min by min/max yields 
the so-called dual statement. 

2 Solution Trees 

A strategy of Max in a tree G is defined as a subtree, including in each max 
node exactly one continuation and in each min node all continuations (all coun- 
termoves to Max) . Since the choice of Max in each position is known in such a 
subtree, Max is able to calculate the outcome for each series of choices that his 
opponent can make. In this paper a subtree with exactly one child in an internal 
max node and all children in a min node, which we have called a strategy for 
Max, will also be referred to as a min solution tree, or briefly a min tree. Dually 
a strategy for Min is defined, also called a max solution tree or a max tree. In 
Figure HI the bold edges generate a max tree. A max tree is denoted by T + and 
a min tree by T~ in this paper. 

Given a min solution tree, the most beneficial choice for Min in each min node 
is a move towards a terminal with minimal value. Consequently, in a given min 
tree (Max strategy) T ~ , the profit for Max under optimal play of Min is equal to 
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the minimum of all pay-off values in the terminals of T . Therefore we introduce 
the following function g for a max tree T + and a min tree : 

g{T + )= max{ f (p) \ p is a terminal in T + } (2-1) 

g(T~) = min{/(p) | p is a terminal in T~} (2-2) 

The intersection of a max tree T + and a min tree T~ consists of exactly one 
path. The ^-definition implies that g(T~ ) < f(po) < g(T + ), where po denotes 
the terminal at the end of the intersection path. It follows that g(T ~ ) < g{T + ) 
for any two solution trees T + and T~ in a game tree. 

Suppose that the Max player confines himself to a certain tree T~ . Then Max 
achieves a pay-off of g{T~), if Min replies consistently towards a terminal with 
value equal to g(T~). If Min deviates from a path towards a terminal equal to 
g{T~), Max gets a higher pay-off. Hence, g{T~) is the guaranteed pay-off for 
Max playing in T~ . It follows that the highest attainable pay-off for Max is equal 
to the maximum of the values g{T~) in the set of all min trees T~ . Dually, the 
most beneficial pay-off from the viewpoint of Min is equal to the minimum of 
the values g{T + ) in the set of all max trees T + . Since the guaranteed pay-off is 
equal to the game value by definition, we come to the following equality holding 
in each node n of a game tree: 

/(n) = max{g(T“) | T~ a min tree rooted in n} 

= min{g(T + ) | T + a max tree rooted in n} 

This equality can be proved formally by means of induction on the height of n. 
Since this equality is due to Stockman 0, it will be referred to as Stockman’s 
theorem in this paper. 



3 The Search Tree 

So far, we were dealing with complete game trees. However, in every game tree 
algorithm the tree is built up step by step. At any time during execution a sub- 
tree of the game tree has been generated. Such a subtree is called a search tree. 
We assume that, as soon as at least one child of a node n is generated, all other 
children of n are also added to the search tree. If the children of a node n have 
been generated, n is called expanded or closed. If a non-terminal n has no chil- 
dren in a search tree (and hence n is a leaf in this search tree) , then n is called 
open. A terminal n is called closed or open respectively, according to whether its 
pay-off value has been computed or not. The foregoing definitions of open and 
closed imply, that an open leaf in a search tree either is a non-terminal, whose 
children have not been generated yet, or is a terminal, whose game value has not 
been computed yet. Obviously, every closed leaf in a search tree is a terminal in 
the game tree. 
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Fig. 2. A search tree derived from Figure Q] 



Since f(p) is not known yet in an open node p of a search tree S', the mini- 
max function cannot be applied in S. To get an idea of the game values we 
assign two preliminary values to each open leaf. First, we assign +oo as a pre- 
liminary value. This gives rise to a function f + in a search tree S, defined as the 
minimax function in S assuming f + (p) = +oo as game value in each open leaf 
p and f + (p) = f(p) in each closed leaf. Second, we assume — oo as game value 
in the open leaves. The related minimax function is called /“. In every node n 
the inequality f~(n) < f(n) < f + (n) holds, which can be shown by induction 
on the height of n. See Figure Q for an instance of a search tree derived from 
Figure Q1 The nodes a, b , c, / and i are open leaves, whereas d , e, g and h are 
closed leaves (terminals that have their game values evaluated). In each node n 
of Figure El the top value denotes / + (n) and the bottom value denotes 

In a search tree with minimax function f + Stockman’s theorem can be ap- 
plied. Likewise, this theorem can be applied to the /“-function. To rule out the 
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annoying nodes with infinite values, we introduce a new definition. For a max 
and a min tree in a search tree this new ^-definition will be given below (similar 
to the c- function in 0). This definition is a generalization of definitions EB 
and <E3, which only hold for solution trees in a complete game tree, i.e. , a tree 
with solely closed nodes. 

g(T + ) = max{ f (jp) \ p is a closed terminal in T + } (3-1) 

g{T~ ) = min{/(p) | p is a closed terminal in T~} (3-2) 

Applying Stockman’s theorem to f + and f~ respectively leads to the equali- 
ties below. Although Stockman’s theorem deals with the old ^-definition, these 
equalities are also valid for the new ^-definition. By a closed solution tree we 



mean a solution tree in a search tree with solely closed leaves. 

f+(n)=mm{g(T+) \ T+ is a closed max tree with rootn } (3-3) 

= ma,x{g(T~) \ T~ is a min tree with root n} (3-4) 

f~(n) = max{g(T~) \ T~ is a closed min tree with rootn } (3.5) 

= min{ ff (T+) | T + is a max tree with root n} (3-6) 



Here we assume that the minimum/maximum of the empty set is +oo/ — oo. 

We will comment on the formulas for f + . (The formulas for f~ are dual.) For 
a min tree T~ with +oo as the game value in the open nodes and for a closed 
max tree T + , the old and the new ^-definition yield the same value. For a non- 
closed max tree T + , the g- value in old sense equals +oo. Consequently, the above 
equalities for /+(n) should be regarded as an application of Stockman’s theorem, 
where non-closed max trees (with infinite g-value) are left out of consideration 
in the right-hand side of fTHTli 

4 A General Theory 

In this section a general theory on game tree algorithms is developed. A key role 
is played by the notions alive and dead. 



4.1 Alive Nodes 

In this subsection the definition and the significance of the notion alive is dis- 
cussed. The definition of an alive node is as follows. A node n in a search tree S 
is called alive if n is on the intersection path of a max tree T + and a min tree 
T~ (either rooted in r) with g{T + ) < g{T~). 

Given an alive node n in a search tree S , we can construct a game tree G n D S, 
whose game value can only be obtained if one particular open descendant of 



1 In this paper each node n is assumed to be its own descendant. 
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Fig. 3. A max and a min tree, derived from Figure |2| 



n is expanded. The construction of G n proceeds as follows. Denote the ac- 
tual values g(T + ) and g(T~) by 31 and 32 respectively. The leaf po at the 
end of the intersecting path must be open, since, if it was not, we would have 
g(T~) < f(p 0 ) < g(T+). Choose a value / 0 with 31 < fo < g 2 . Define f(p 0 ) = fo 
and f{p) < gi for any open node p 7^ po in T + and f(q) > g 2 for any open node 
q ^ Po in T~ . To complete G n , the other open nodes in S (if any) are closed 
arbitrarily. After being extended, both T + and T~ have 3- values equal to fo- 
Stockman’s theorem entails, that the game value of G n equals fo- As long as po 
is not closed in G n , T + and T~ satisfy g{T + ) = g 1 < g 2 = g(T~) and every 
value in the range [31,32] is still achievable as game value for r. 

The above construction is illustrated using the Figures 0 and 0 Figure 0 shows 
solution trees T + and T~ with g(T + ) = 3 and g(T~ ) = 8. Node u is on the 
intersecting path and is therefore alive. The game tree G u is constructed by 
defining f(b) = fo with f 0 £ [3,8], /(c) < 3 and /(a) > 8. The nodes / and i in 
Figure 13 may be closed arbitrarily. 

4.2 Definition of the fi-Functions 

In this subsection, we give an alternative definition for the notion alive. This 
definition implies a practical method to establish whether a node is alive, using 
the so-called /i-functions. These /i-functions in a search tree S are defined as: 

h~(n ) = min{3(T + ) | T + a max tree in S through rand n } (4-1) 

h + (n) = max{3(T _ ) | T~ a min tree in S through rand n} (4-2) 
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It is easily seen that a node n is alive iff h ( n ) < h + (ri). 

As a result of (El and (El respectively, the definition of the h- functions reduces 
in the root to h + (r ) = / + (r) and h~(r) = f~{r). Since every solution tree 
considered in the above definition goes through r, r has a maximal /i + -value and 
a minimal h~-v alue in any given search tree S. 

Extending the equality f + (r) = /i + (r), we will give formulas for the h- functions 
in any other node. Those formulas are of highly practical significance. To this 
end we need a new notion. Denote by AMAX{n) / AMIN(n) the set of max/min 
nodes, that are proper ancestors of n. The following interesting formulas hold 
for the h-v alues in a node n of a search tree S. 

h~(n ) = max{/ _ (m) | to G {n} U AMAX(n)} (4.3) 

h + (n ) = min{/ + (m) | m G {?r} U AMIN(ri)} (4-4) 

We give a sketch of the proof for El- As a result of El, every node m G 

{n}UAMAX(n) is the root of a max tree T m with g{T m ) = In the super- 

position of all those trees T mi we choose arbitrarily a max tree T + through r and 
n. This tree T + satisfies the following equality: g[T + ) = h~(n) = max{/ _ (ro) | 
to G {n} U AMAX(n)}. See [ZJ for a full proof. 



4.3 Dead Nodes 

A node that is not alive is called dead. It is easily shown that every ancestor of 
an alive node is alive as well. As a result, a descendant of a dead node is dead. 
In terms of the h- functions, we may state that n is dead iff h~{n) > h + {n). The 
/i-functions will be utilized in this subsection to derive some properties of dead 
nodes. 

Consider a given max tree T + including an open dead node p. By the definition of 
h~ we have g{T + ) > h~(p). There is a node m GAMIN(p) with f + (m) = h + (p ) 
due to IQl . and to is the root of a closed max tree T' with g(T') = f + (m ) due to 
El- Hence, h + (p) is associated not only with a min tree through r and p (by def- 
inition), but also with a max tree rooted in a node m GAMIN{p). We perform the 
following transformation to the given tree T + . Remove the subtree below m from 
T + and append T' to T + in to. Since g(T + ) > h~{p) > h + {p) = f + (m) = g{T'), 
the g - value does not increase by this transformation. Since T' is a closed solution 
tree, the transformed tree has solely closed leaves below to. In a similar way, any 
other open dead node can be eliminated from T + . The resulting tree does not 
include any open dead node and its g-value does not exceed the original value 
g(T+). 

To illustrate the above transformation, see Figure O It is easily seen using 114.311 
and (14.41) that h~(i) = 6 and h + (i) = 3, meaning that i is dead. Any max tree 
T + through i has g - value > 6. The subtree below v in such a tree T + may be 
replaced by the max tree rooted in v and ending up in the terminals e and g. 
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Given an alive node n, the above transformation can be applied to a max tree 
associated with h~(n), i.e., a max tree T + such that g{T + ) = h~(n). This results 
into a new tree avoiding open dead nodes and not exceeding h~(n) by its g- value. 
Since f + (m ') > h + (n ) > h~(n) = g(T + ) for every node m! £ {n}UAMIN(n), 
replacing a subtree of T + rooted in m' with a closed subtree would raise the 
g-value. We conclude that no node from {n}UAMIN(n) is involved in the above 
transformation of T + eliminating open dead nodes. It follows that, given an 
alive node n , a solution tree T + through n associated with h~{n) can be found 
avoiding any open dead node. As long as the algorithm does not expand an open 
node of this tree T + , the value h~(n) is unaffected. Therefore, while expanding 
a dead node, the /i _ -value of any alive node in a search tree is not affected. For 
reasons of duality, any alive /i + -value isn’t affected either. 

4.4 Main Theory 

We now come to our theory consisting of four observations. 

a) We have shown in subsection 4.1 that, if n is alive, a game tree G n can 
be constructed, in which f(r) is unknown as long as one particular open 
descendant po of n is not expanded. The conclusion is that any alive node n 
cannot be discarded. 

b) As a result of a), an algorithm must continue as long as the search tree 
contains any alive nodes. Therefore, the algorithm may only stop when all 
nodes in the search tree are dead. We might say therefore, that any game 
tree algorithm actually aims at killing the entire search tree. 

c) All nodes in a search tree S are dead iff g(T + ) > g(T~) for any two solution 
trees T + and T~ in S. As a result of (HT!) and (EH) , this condition is equiva- 
lent to the equality f~(r) = / + (r), which is the stop criterion of every game 
tree algorithm therefore. When the condition f~(r) = / + (r) is achieved, 
both a closed max tree and a closed min tree with g- value equal to f(r) are 
present in the search tree. The superposition of these two trees is called a 
critical tree. This notion has been introduced in |Ej, with a totally different 
definition however. Since the algorithm must continue until f~{r) = f + (r) 
holds, we conclude that every game tree algorithm needs to build a critical 
tree. 

The intersection of the max and the min tree in a critical tree is a path with 
constant /-value, as can easily be shown using Stockman’s theorem. 

d) Expanding descendants of a dead node does not affect the /i-values of any 
alive node, as we have shown in subsection 4.3. Consequently, an alive node 
can only be killed by expanding an alive node. For a game tree algorithm to 
achieve its goal, every node needs to be killed. Therefore, expanding a dead 
node is useless. Since every dead node has solely dead descendants, a dead 
node along with the subtree underneath may be neglected during the search. 

Notice that the notes a) and d) constitute a general cut-off criterion for game 
tree algorithms: alive nodes must be respected, dead nodes may be neglected. 
Note c) describes the situation on termination of a game tree algorithm. 
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5 Two Well-Known Game Tree Algorithms 

In this section we will discuss, how some well-known game tree algorithms fit 
into our theory. The results are not proved. See f° r details. 

5.1 The Alphabeta Algorithm 

An extensive treatment of the alphabeta procedure can be found in ;6j . A proce- 
dure call alphabeta{n , a, (3) has three parameters: a node n in a game tree, and 
two numbers a and (3. The precondition is a < (3. We present a postcondition 
of alphabeta, which extends the postconditions in ||6. and |i 3j , in that it relates 
the new functions f + and f~ to the return value of an alphabeta call. For an 
alphabeta call with return value v, the following postcondition applies: 

v < a => v = / + (n), 
a<v<(3=>v = f(n) 
v > f3 => v = 

In case of v = / + (n), the search tree contains a closed max tree T + with 
g(T+ ) = /+( , cf-dJ. An extra feature is, that this max tree is unique in 

the search tree. The case v = f~{n) is dual. 

A call alphabeta(r, — oo,+oo) computes the exact value of a game tree and is 
referred to as the alphabeta algorithm. Using the new postcondition, the fol- 
lowing property can be proved. When a node n is parameter in a nested call 
alphabeta (n, a, j3) of the alphabeta algorithm, this node is open and alive and 
satisfies the relation h~(n) = a < (3 = h + (n). Moreover, every node to the left 
of n is dead. Consequently, the alphabeta algorithm may be characterized as the 
algorithm expanding the leftmost open alive node in each step. 

5.2 SSS* and MT-SSS 

The SSS*-algorithm was published in 1979 by Stockman fjjjjj. Before 1994, it was 
never used in actual applications. Nevertheless, SSS* has drawn considerable 
attention in literature. The originating paper is one of the top 50 referenced in 
the AI Journal p. A suitable explanation of the algorithm can be found in jSj. It 
was shown recentlyjH] that SSS* is equivalent to a series of alphabeta calls with a 
null-window, i.e. , an a-/3-window with (3 = a- 1- 1. This new formulation is called 
MT-SSS and has turned out to be a very convenient and efficient algorithm. The 
above characterization of alphabeta has its counterpart for SSS* and MT-SSS. 
Without going into details, we mention the following properties. When a node n 
is expanded by a null-window call in MT-SSS, then (3 = h + (n) = h + {r ) = f + (r) 
being a maximal /i + -value in the actual search tree according to subsection 4.2. 
In addition, every node to the left of n with the same /i + -value is dead. Similarly, 
the merit for a node in the list maintained during SSS* is equivalent to the h + ~ 
function and, whenever a node n is selected from the list, every node to the left 
of n is dead. However, since the /i _ -value is not taken into account in MT-SSS or 
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SSS*, aliveness is not guaranteed for a node n when being selected or expanded 
during SSS* or MT-SSS. Fortunately, only in very rare situations n is dead. See 
0 for an example. The costs of the search overhead caused by expanding a dead 
node in rare cases does not outweigh the costs of maintaining the /i _ -function in 
the search tree. We are allowed to state, that in virtually each step during SSS* 
or MT-SSS, the selected or expanded node n is the leftmost open alive node with 
maximal h + -value. So a characterization of MT-SSS or SSS* is obtained. 

There exist also dual versions of SSS* and MT-SSS, named Dual and MT-DUAL, 
where the /i - -function is considered instead of the /^-function. 

6 Concluding Remarks 

In this paper we developed a full theory on game tree algorithms, entirely based 
upon the notion solution tree. Stockman’s theorem on solution trees was the basic 
principle underlying the theory. The main points were presented in Section 4. 
As a result we obtained a pruning criterion. In Section 5, we showed how two 
major algorithms fit into the theory. We may say, that MT-SSS and MT-Dual 
are best-first instances, whereas alphabeta is a depth-first instance. 

Two fairly important algorithms are not discussed yet, viz. proof number search 
and Negascout. The role of solution trees in those algorithms was described in 
0 
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Abstract. Given k > 3 heaps of tokens. The moves of the 2-player game 
introduced here are to either take a positive number of tokens from at 
most k — 1 heaps, or to remove the same positive number of tokens from 
all the k heaps. We analyse this extension of Wythoff ’s game and provide 
a polynomial-time strategy for it. 
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1 Introduction 

We propose the following two-player game on k heaps with finitely many tokens, 
where k > 3. There are two types of moves: (i) remove a positive number of 
tokens from up to k — 1 heaps, possibly k — 1 entire heaps, or, (ii) remove the 
same positive number of tokens from all the k heaps. The player making the last 
move wins. 

Any position in this game can be described in the following standard form: 
(too, . . . , TOfe_ i) with 0 < Too < . . . < rrik-i, where is the number of tokens 
in the i-th heap. Given any game P, we say informally that a P-position is any 
position w of f from which the Previous player can force a win, that is, the 
opponent of the player moving from u. An N- position is any position v of r 
from which the Next player can force a win, that is, the player who moves from 
v. The set of all P-positions of P is denoted by V, and the set of all IV-positions 
by A f. Denote by F(u) all the followers of u , i.e., the set of all positions that can 
be reached in one move from the position u. It is then easy to see that: 

For every position u of P we have u £ V if and only if F(u) C J\f ; 

and u € N if and only if F(u) nP/0. (1) 

For n £ Z°, denote the ?r-th triangular number by T n = ^ n[n + 1). We prove, 

Theorem 1. Every P-position of the game can be written in the form (T n ,mi, 
...,TOfc_i), where the ( k — 1 )-tuples (mi, . . . , TOfc-i) range over all the ( un- 
ordered ) partitions of ( k — 1 )T n + n with parts of size > T n . In other words, 

V = lT=o P n ’ where 
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k-l 

P n = {(T n ,mi, ... , TOfc_i) :' s y^m i = {k- 1 )T n + n , 

i=l 

T n < mi < . . . < Wfc_ i, n £ Z°J . (2) 

Example. For k = 4, 

P n = {(T n , mi, m 2 , m 3 ) : mi + m 2 + m 3 = 3T n + n, n £ Z 0 } . 



The first few P-positions are: 

P o = {(0,0,0,0)} 

Pi ={(1,1, 1,2)} 

P 2 = {(3,3,3,5), (3, 3, 4, 4)} 

P 3 = {(6,6,6,9), (6, 6, 7, 8), (6, 7, 7, 7)} 

P 4 = {(10,10,10,14), (10,10,11,13), (10,10,12,12), (10,11,11,12)} 

P 5 = {(15,15,15,20), (15,15,16,19), (15,15,17,18), 

(15,16,16,18), (15,16,17,17)}. 



2 The Proof 



Throughout, as in (2), every fe-tuple {T n , mi, . . . , mk-i), (mo, . . . , mk-i) or (k — 
l)-tuple (mi, . . . , mk-i) is arranged in nondecreasing order. Any of the first two 
tuples is also called a position (of the game) or partition (of kT n + n) ; and the 
third is also a partition (of (k — 1 )T n + n). The terms m, are called components 
(of the tuple) or parts (of the partition) . 

Lemma 1. Given any partition (mi, . . . , mu) of {k — 1 )T n + n, where each 
part has size > T n . Then each part has size < T n+ 3 . 

Proof. We have, 



k—2 

(k - 1 )T n + n — m k -i = m i > (fe - 2)T n . 

i=l 



Hence for all i € {1, . . ., k — 1}, m, < rrik-i <T n + n = T n+ 1 — 1. ■ 

Lemma 2. Let k > 3 and n € Z°. Every integer in the semi- closed interval 
t £ [T n ,T„+i) appears as a component in some position of P n . It appears in P m 
for no m^= n. 

Proof. The smallest component in P n is T n , and by Lemma 1, the largest part 
cannot exceed T n + n = T n+ i — 1. Hence t € [T n , T n +i) appears as a component 
in P m for no m ^ n. Let t £ [T n , T n+ 1 ), say t = T n +j, 0 < j < n. Then for k > 3, 
T n + j appears in the partition {mi, . . . , m.k-i} = {T^ 3 , T n + n — j, T n + j} of 
( k — 1 )T n + n, where T^~ 3 denotes k — 3 copies of T n , and so T n + j appears in 
some position of P n . ■ 
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Proof of Theorem 1. It follows from (1) that it suffices to show two things: 
(I) A player moving from any position in P n lands in a position which is in P m 
for no m. (II) From any position which is in P m for no m, there is a move to 
some P n , n £ Z°. The fact that (I) and (II) suffice in general for characterizing 
V and A f, is shown in 6- for the case of games without cycles, based on a formal 
definition of the P- and iV-positions, and a proof of (1). (It is not true for cyclic 
games: given a digraph consisting of two vertices u and v, and an edge from u 
to v, and an edge from v to u. Place a token on u. The two players alternate in 
pushing the token to a follower. The outcome is clearly a draw, since there is no 
last move. However, putting V = {u}, J\f = {A}, satisfies (1).) 

(I) Let P n be any fc-tuple of the form (2). Removing tokens from up to k — 1 
heaps, including the first heap, results in a position Q such that the first element 
is in Pj for some j < n, yet there is a heap whose size is a component in P n . 
Thus Q £ P m for no m by Lemma 2. Removing tokens from up to k — 1 heaps, 
excluding the first heap, results in a position Q whose last k — 1 components 
sum to a number < (k — 1 )T n + n. Since, however, the first component is in T n , 
Q is not of the form (2) . Hence Q £ P m for no m. 

So consider the move from P n which results in Q = ( T n — t , m\—t , . . . , m,k-i — 
t ) for some t £ TZ . If Q £ P m for some m < n, then T n — t = T m . Then 
(T n — t) + (mi — t) + . . . + (mfc_ i — t) = kT n + n — kt = kT m + m. Thus, 
0 = k(T n — T m — t) = m — n < 0,a contradiction. Hence Q £ P m for no m. 

(II) Let (mo, . . . , m,fc_i) be any position which is in P m for no m. Since 
UZolTn, Tn+i) is a partition of Z°, we have mo £ [T n ,T n+ 1 ) for precisely one 
n £ Z°. Put L = ZZiZ i m i- 

Case (i). mo = T n . If L > (k — 1 )T n + n, then removing L — (k — 1 )T n — n 
from a suitable subset of {mi, . . . , rrik-i}, results in a position in P n . So suppose 
that L < (k — 1 )T n + n. Then L = {k — l)T n + j for some j £ {0, . . . , n — 1}. 
Subtracting T n — Tj from all components then leads to a position in Pj. Indeed, 
m 0 - (T n - Tj) = Tj, and YltZl ( m i - i T n ~ Tj)) = (k - 1)7} + j. 

Case (ii). T n < mo < T n+ 1 , say mo = T n + j, j £ {1, . . . , n}. Suppose first 
that L > (k — 1 )T n + n + j. If mi < T„+i, subtract j from mo to get to T n . 
By the first part of Lemma 2, mi is a part in some partition of (fc — 1)T„ + n. 
Then reduce, if necessary, a subset of the rri, for i > 1, so that mi + m 'i = 

( k — l)T n + n. Here and below, m! i denotes m^ after a suitable positive integer 
may have been subtracted from it. If mi > T n+ 1 , then decrease mi to T n . Then 
T n + XX^i mi — Tn + j + T n + [k — 2)T n _(_i > kT n + (k — 2)(n + 1) + 1 > 
kT n + n + 2 > kT n + n, since k > 3. Again by Lemma 2, mo is a part in some 
partition of (fc — 1)T„ + n. So reducing, if necessary, a subset of the m^ for i > 2, 

we get mo + YnZl m 'i = l ) T n + n- 

So consider the case L < (k — T)T n + n + j. We claim that subtracting 
mo — T m from all components of (mo,...,mn) leads to a position in T m , 
where m = L — (k — l)mo- First note that m = '}2i=i mi — (k — l)mo > 0, and 
m = L — (k — l)mo < (k — 1)7} + n + j — (k — l)mo = n — {k — 2 )j < n — j < n 
(since k > 3), so 0 < m < n , as required. Secondly, mo — (mo — T rn ) = T m , and 
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1 ( m i - ( m 0 - Tm)) = L—{k — l)(mo - T m ) = (k — 1 )T m + m. (Note that 
for L = (k — 1 )T n + n + j we provided two winning moves. The second leads to 
a win faster than the first.) 

In conclusion, we see that USo -Pi = P- ■ 

3 Aspects of the Strategy 

We observe that the statement of Theorem 1 tells a player whether or not it is 
possible to win by moving from any given position. The proof of the theorem 
shows how to compute a winning move, if it exists. Together they form a strategy 
for the game. 

The strategy can, in fact, be computed in polynomial time. Given any po- 
sition Q = (too, . . . , rrifc_i) of the game. Its input size is 0 (l°S m *)) ■ 

Solving Too = n(n + l)/2 leads to n = [(VI + 8 rrio — 1)/2J. By Theorem 1, 
Q £ V if and only if mo = T n , where n = + 8mo — l)/2 is an integer, 

and J2i = i m i = (& — 1 )T n + n. Otherwise Q G Af, and the proof of Theorem 1 
indicates how to compute a winning move to a P„-position. All of this can be 
done in time which is polynomial in the input size. 

It is also of interest to estimate the density of the P-positions in the set of 
all game positions. Subtracting T n — 1 from each rrii in the sum of (2), we get 
partitions of the form 

x\ + . . . + Xk - i = n + k — 1 , 1 < x\ < . . . < Xk-i < n + 1 , 

where Xj = m* — (T n — 1). The number pk—i(n+ k— 1) of partitions of n+k— 1 into 
k — 1 positive integer parts is estimated in jJJ Ch. 4]. It is a polynomial of degree 
k — 1 in n + k— 1, whose leading term is (n + fc — l) fc_2 /(fc — 2)! . Thus the number 
of positions P n forn<N is estimated by tt(N) = + k — l) k ~ 2 / (k — 2)\ . 

It is easy to see that 

/ N pN-\-1 

(x + k — l) fc-2 /(fc — 2)! dx < 7r(iV) < / (x + k — l) k ~ 2 /(k — 2)! dx , 

-l Jo 

leading to 

(N+k — l) fc_1 — (k — 2) k ~ 1 , x (N + fc) fc_1 — (fc — l) fe_1 

(Fmi Fyi ' 

The total number of positions up to Py is the number of partitions of the form 
m.Q + . . . + m.fc_i = n, 0 < mo < . . . < rrik-i, where n ranges from 0 to feTjv + N . 
Adding 1 to all the parts, we get partitions of the form xo + . . . + Xk-i = n + k, 
1 < xq < ... < Xk - 1 < n + k, whose number is pk(n + k). As above, the total 
number of positions is thus estimated by v(N) = ro +N ( n + k) k ~ 1 /(k — 1)! . 

Using integration as above, we get 

(fcPjy + N + k) k — (fc — l) fc < < (kT N +N + k + l) fc - k k 

kl ~ ~ k\ 
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For large N, the ratio is thus about 

7r(JV) k f N+k \ fc_1 

v(N) ~ kT N + N + k \ kT N + N + k) 

Dividing the numerator and denominator of the second fraction by N k ^ 1 results 
in ir(N)/v(N) = 0(1/N k+1 ). We see that the P-positions are rather rare, so our 
game sticks to the majority of games in the sense of (2] and cnj. The rareness 
of P-positions in general, is, in fact, consistent with the intuition suggested by 
(1): a position is in V if and only if all of its followers are in A f, whereas for a 
position to be in Af it suffices that one of its followers is in V . The scarcity of 
the P-positions is the reason why game strategies are usually specified in terms 
of their P- positions, rather than in terms of their IV-positions. 



4 Epilogue 

In the heap games known to us, such as those discussed in Q, the moves are 
restricted to a single heap (which might, in special cases, be split into several 
subheaps). We know of two exceptions. One is Moore’s Mm*,, 0, where up to 
k heaps can be reduced in a single move (so Nimi is ordinary Nim). The other 
is Wythoff’s game, Wyt, cn, ei. □, na, where a move may affect up to two 
heaps. The motivation for the present note was to extend Wythoff’s game to 
more than two heaps. 

Wyt is played on two heaps. The moves are to either remove any positive 
number of tokens from a single heap, or to remove the same positive number of 
tokens from both heaps. Denoting by (x, y ) the positions of Wyt, where x and 
y denote the number of tokens in the two heaps with x < y, the first eleven 
P-positions are listed in Table 1. The reader may wish to guess the next few 
entries of the table before reading on. 

For any finite subset S C S , define the Minimum BXcluded value of S as 
follows: mex S = minZ 0 \ S = least nonnegative integer not in S Qj. Note that 
if S = 0, then mex S = 0. The general structure of Table 1 is given by: 

A n = mex{Ai, Bj : 0 < i < n}, B n = A n + n (n£ Z°) . 

Since the input size of Wyt is succinct, namely <9( log(x + y)), one can see 
that the above characterization of the P-positions implies a strategy which is 
exponential. A polynomial strategy for Wyt can be based on the observation 
that A n = \na\, B n = [n/3\, where a = (1 + \/5)/2 is the golden section, 
(3 = (3 + \/5)/2. Another polynomial strategy depends on a special numeration 
system whose basis elements are the numerators of the simple continued fraction 
expansion of a. These three strategies can be generalized to Wyt a , proposed and 
analysed in [2(, where a € Z + is a parameter of the game. The moves are as in 
Wyt, except that the second type of move is to remove say k > 0 and l > 0 from 
the two heaps subject to \k — l\ < a. Clearly Wyti is Wyt. 
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Table 1 . The first few P-positions of Wyt. 



n 


A n 


B n 


0 


0 


0 


1 


l 


2 


2 


3 


5 


3 


4 


7 


4 


6 


10 


5 


8 


13 


6 


9 


15 


7 


11 


18 


8 


12 


20 


9 


14 


23 


10 


16 


26 



The generalization of Wyt to more than two heaps was a long sought-after 
problem. In jSj it is shown that the natural generalization to the case of k > 2 
heaps is to either remove any positive number of tokens from a single heap, 
or say from all of them simultaneously, where the U are nonnegative 

integers with h> 0 and l\ ® . . . ® Ik = 0, and where ® denotes Nim-sum 
(also known as addition over GF(2), or XOR). In particular, the case k = 2 is 
Wyt. But the actual computation of the P-values seems to be difficult. 

The heap-game considered here is a generalization of the moves of Wyt, but 
not of its strategy. In fact, it doesn’t specialize to the case k = 2; we used the 
fact that k > 3 in several places of the proof. However, the P-positions of the 
present game have a compact form, the exhibition of which was the purpose of 
this note. 

We remark finally that the Sprague-Grundy function g of a game provides a 
strategy for the sum of several games. The computation of g for Nim^, k > 2, 
and Wyt a , a > 1 seems to be difficult. It would be of interest to compute the 
g - function for the present game. Perhaps this is also difficult. 
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Abstract. We define the family of locally path-bounded digraphs, which 
is a class of infinite digraphs, and show that on this class it is relatively 
easy to compute an optimal strategy (winning or nonlosing); and realize 
a win, when possible, in a finite number of moves. This is done by prov- 
ing that the Generalized Sprague-Grundy function exists uniquely and 
has finite values on this class. 

Keywords: infinite cyclic games, locally path-bounded digraphs, gener- 
alized Sprague-Grundy function 



1 Introduction 

We are concerned with combinatorial games , which, for our purposes here, com- 
prise 2-player games with perfect information, no chance moves and outcome 
restricted to (lose, win), (draw, draw) for the two players. A draw position is 
a position in the game such that no win is possible from it, but there exists a 
next move which guarantees, for the player making it, not to lose. You win a 
game by making a last move in it. A game is impartial if for every position in 
it, both players have the same set of next moves; otherwise it’s partizan. Nim is 
impartial, chess partizan. A game is cyclic if it contains cycles (the possibility of 
returning to the same position), or loops (pass-positions). These notions, slightly 
changed here, can be found in ITJ. It is clear that a necessary (yet not sufficient) 
condition for the existence of draw positions is that the game be cyclic. 

For Partizan cyclic games, see 0, H2I, 0, cm, 0; finite impartial cyclic 
games are discussed only briefly in P,0 Particular finite impartial cyclic games 
are analyzed in PJ , [7 . Infinite impartial games are treated briefly at the end of 
PI, where both the “generalized Sprague-Grundy function” 7, defined below, 
and its associated “counter function” were permitted to be transfinite ordinals. 

Our purpose here is to define a certain class of infinite digraphs on which 
7 assumes only finite values, but the counter function may contain transfinite 
ordinal values. The motivation for doing this is based, in part, on the following 
considerations. It is easier to compute with finite than with transfinite ordinals. 
For just determining who of the two players can win (or that both can at most 
draw) in a sum of games, the “generalized Nim-sum” of a finite set of 7-values 
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seems to be required (§4). The generalized Nim-sum is based on the binary 
expansion of ordinals. It’s easy to see that every ordinal, finite or transfinite, 
has a unique expansion as a finite sum of powers of ordinals (based on the 
greedy algorithm and the fact that the ordinals are well-ordered — see m xj v, 
§19]). For example, oj = 2 U . We do not wish to enter here into the question 
of the computational complexity of computing with transfinite ordinals. But it 
seems possible that it’s easier to compare the size of ordinals with each other, 
which suffices for counter function values, than to compute and work with their 
binary expansions, as needed for the 7-values. The counter function helps for 
consummating a win, and this can be done, in a finite number of moves even 
if the counter function value is a transfinite ordinal, since the ordinals are well- 
ordered. Moreover, often the structure of the digraph is such that the 7-function 
itself suffices to provide a winning strategy, without the need of an additional 
counter function (§4). 

The connection between games and digraphs is simple: with any impartial 
game r we associate a digraph G = (V, E) where V is the set of positions of 
r and (a, b) £ E if and only if there is a move from position a to position 
b. It is called the game-graph of r. We identify games with their corresponding 
game-graphs, game positions with digraph vertices and game moves with digraph 
edges, using them interchangeably. It is thus natural to define a cyclic digraph 
as a digraph, finite or infinite, which may contain cycles or loops. 

In §2 we provide basic tools needed for the statement and proof of the result 
(Theorem 1), and §3 contains the proof. An example demonstrating Theorem 1 
is given in the final §4. 

2 Preliminaries 

The subset of nonnegative integers is denoted by Z°, and the subset of positive 
integers by Z + . 

Given a digraph G = (V,E). For any vertex u £ V, the set of followers 
of u is F(u) = {v € V: (u,v) £ E}. A vertex u with F(u) = 0 is a leaf. The 
set of predecessors of u is F -1 (u) = {w £ V: ( w,u ) £ E}. A walk in G is any 
sequence of vertices u±,U 2 , ■ ■ not necessarily distinct, such that (zq, u, + i) £ E, 
i.e., Ui + 1 £ F(ui) ( i £ Z + ). Edges may be repeated. A path is a walk with all 
vertices distinct. In particular, there’s no repeated edge in a path. The length of 
a path is the number of its edges. If every path in G has finite length, then G 
is called path- finite. If there exists b £ Z° such that every path in G has length 
< b , then G is path-bounded. 

Definition 1. A cyclic digraph is locally path-bounded if for every vertex Ui 
there is a bound bi(ui) = bi £ Z° such that the length of every (directed) path 
emanating from Ui doesn’t exceed bi . The integer bi is the local path bound of 

Ui. 

Note that every path-bounded digraph is locally path-bounded, and every 
locally path-bounded digraph is path-finite. But neither of the two inverse rela- 
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tionships needs to hold. Our main result is concerned with locally path-bounded 
digraphs. 

Given a digraph G = (V,E). The Generalized Sprague- Grundy function, 
also called 7- function, is a mapping "f.V — > Z° U {00}, where the symbol 00 
indicates a value larger than any natural number. If 7 (u) = 00, we say that j(u) 
is infinite. We wish to define 7 also on certain subsets of vertices. Specifically: 
7 (F(u)) = {7(1') < 00: v £ F(u)}. If 7 (u) = 00 and 7 (F(u)) = I\, we also write 
7 (u) = 00 (K). Next we define equality of y(u) and j(v): if 7 (u) = k and j(v) = £ 
then 7 (u) = j(v) if one of the following holds: (a) k = £ < 00; (b) k = 00 (K), 
£ = 00 (L) and K = L. We also use the notations 

V f = {uGV-.'yiu) <00} , V°° = V\V f , 

7 '(u) = mex7(F(u)) = mex{7(t>) < 00: v £ F(u )} , (1) 

where for any finite subset S C Z°, the Minimum UAcluded value mex is defined 
by 

mex S = min(Z° \ S) = minimum term in Z° not in S . 

We need some device to tell the winner where to go when we use the 7- 
function. For example, suppose that there is a token on vertex u (Fig. 1). It 
turns out that it’s best for the player moving now to go to a position with 7- 
value 0. There are two such values: one (the leaf) is an immediate win, and 
the other (v) is only a nonlosing move. This digraph may be embedded in a 
large digraph where it’s not clear which option leads to a win. The device which 
overcomes this problem is a counter function, as used in the following definition. 
For realizing an optimal strategy, we will normally select a follower of least 
counter function value with specified 7- value. The counter function also enables 
us to prove assertions by induction. 

Definition 2. Given a cyclic digraph G = ( V , E). A function 7: V — > Z° U {00} 
is a 7 - function with counter function c: V f — > J, where J is any infinite well- 
ordered set, if the following three conditions hold: 

A. If 7(u) < 00, then 7 (u) = 7 '(u). 

B. If there exists v € F(u) with 7(1;) > 7 (u), then there exists w £ F(v) 
satisfying 7 (w) = 7 (u) and c(w) < c(u). 

C. If 7 (u) = 00, then there is v £ F(u ) with ^(v) = 00(A) such that 
7 '(u) £ K. 

Remarks. 

• In B we have necessarily u £ V? ] and we may have j(v) = 00 as in C. 

• To make condition C more accessible, we state it also in the following equiv- 
alent form: 

C’. If for every v £ F(u ) with 7(1:) = 00 there is w £ F(v) with 7 (w) = 
7 / (it), then "f(u) < 00. 

• If condition C’ is satisfied, then 7 (u) < 00, and so by A, 7 (w) = 7 '(u) = 
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• To keep the notation simple, we write oo(0), oo(l), oo(0, 1) etc., for oo({0}), 
oo({l}), oo({0, 1}), etc. 

The 7-function was first defined in HU- It was found independently in jEi]. 
The simplified version given above, and two other versions, appear in Em- Since 
this function is not well-known, we repeated its definition above. The 7- function 
exists uniquely on any finite cyclic digraph, but its associated counter function 
exists nonuniquely. 

We say that a digraph G has a 7-function, if 7 exists on G with values 
restricted to Z° U {00}, i.e., no transfinite ordinal values. 

We are now ready to state our main result. 

Theorem 1. Every locally path-bounded digraph G = (V,E) has a unique 7- 
function with an associated counter function; and for every u £ V? , 7 (u) doesn’t 
exceed the length of a longest path emanating from u. The statement doesn’t hold 
in general for path-finite digraphs. 



3 The Proof 

We wish to examine some properties of path-finite and locally path-bounded 
digraphs. To begin with, is it clear that for a path-finite digraph, if v £ F(u), 
then every path emanating from v is not longer than any path emanating from 
ul 

Perhaps it is clear, but it’s also wrong: in a path-finite graph, every path 
originating at some vertex u and continuing to its ultimate end, terminates at a 
vertex v, where v is either a leaf or a predecessor of some w on the path. Thus 
a path of minimum length emanating from u in Fig. 1 terminates at the leaf, 
whereas a path of maximum length beginning at u terminates at y. It has length 
3. But a maximal-length path emanating from v £ F(u) clearly has length 4. 



0 




Fig. 1. The numbers are 7- values. 



If a digraph G = ( V,E ), possibly with infinite paths, has no leaf, then the 
label 00 on all the vertices is evidently a 7- function: A and B are satisfied 
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vacuously, and C is satisfied with 7 '(u) = 0 for all u £ V. If G has a leaf, then 
some of the vertices have a 7-function, such as the leaf and its predecessors, but 
possibly 7 doesn’t exist on some of the vertices. For the case where 7 exists on a 
subset V' C V, we define 7 '(u) = mex{y(F(u)) : F(u ) C V'}. Since F(u) may, 
nevertheless, be infinite for any vertex u in a locally path-bounded digraph, it is 
not clear a priori that Y( u ) exists. The following lemma takes care of this point. 

Lemma 1. Let u be any vertex with local path bound b in a locally path-bounded 
digraph G = (V,E). Then Y( u ) exists (i.e., it is a nonnegative integer ), and in 
fact, Y( u ) < b. 

Proof. We consider two cases. 

(i) Suppose that u has finite 7- value m. Then Y{ u ) exists, and in fact, 7 '(u) = 
7 (u) = m by A. Moreover, there exists ui € F(u) with 7(111) = m — 1, there 
exists i(2 G F(u 1) with 7(1(2) = m — 2 ,..., there exists u m € F(u m - 1) with 
7 (u m ) = 0. Then u, u\, . . . , u m is a path of length to, so to- < b. (The path may 
continue beyond u m , but in any case Y( u ) = m < b.) 

(ii) Suppose that u has either no 7- value or value 00. It suffices to show that 
if v € F(u) fl V-f , then 7(1;) < b. Indeed, |F(w)| may be infinite, and F(u) may 
contain vertices with no 7-value. But if 7(1;) < b for all v £ F(u ) fl VY then 
clearly 7 '(u) exists and 7 '(u) < b. Note that we cannot use the argument of 
case (i) directly on v, since a path from v may be longer than a path from u, as 
we just saw. So suppose there is vo G F(u) fl with 7(^0) = n > b. As in case 
(i), there is a path vq, v\, . . . , v n of length n with 7(1;,) = n — i (i £ {0, . . . , n}). 
Then u,v o,Vi, , . . ,v n is a walk of length n + 1 > b emanating from u. Hence it 
cannot be a path. But u, Y v j f° r * Y 3i since 7 (vf) Y 7 (vj). Hence Vj = u 
for some j G {0 , . . . , n}. The contradiction is that Vj does and u doesn’t have a 
finite 7-value. Thus j(v) < b for all v £ F(u ) fl V?, hence 7 '(u) exists, and in 
fact, 7 '(u) < b. ■ 

Proof of Theorem 1 . Let V' C V be a maximal subset of vertices on which 
7 exists, together with an associated counter function c, subject to the following 
additions to B and C of Definition 2: 

If 7 (u) < 00 and there is v £ F(u) fl V v , 

then there is w G F(v) with 7 (w) = y(u), c(w) < c(u ) , (2) 

if 7 (u) = 00, then there is v G F(u) with y(v) = 00 

such that w G F(v) fl V v ==> 7 '(w) Y Y( u ) 1 (3) 

where V v = V \ V’ . (In (3) we have Y( w ) Y 7 / (w), instead of w £ F(v) and 
7(111) Y Y( u ) i n C.) In addition we require: 

If 7 (u) = 00 with Y( u ) = ^ then Yi v ) > l f° r v £ V v . (4) 

The subset V' is maximal in the sense that adjoining any u £ V v into V' violates 
either Definition 2, or (2) or (3) or (4). If V v Y 0, let u £ V v . By Lemma 1, 
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7 '(u) = k exists for some k £ Z°. It follows that there is a minimum value 
to = min {A; € Z° : u £ V v , 7 '(u) = k}. Let K = {u £ V v : 7 '(u) = to}. Then 
V v Y 0 ==> K Y 0 . We consider four cases. 

Case 1 . For every u £ K we have to. G 7 '(F(u)), where, consistent with 

( 1 ), 

7 '(F(u)) = {y'(v) : v £ F(u)} = {mex7(F(u)) : u G F(u)} 

= {mex{7(w) < 00 : w £ F(v),v £ F(u)}} . 

Note that u £ K, v £ F(u ) =$■ y(v) ^ m by the definition of mex, so v £ F(u), 
y'(v) = to. =>• v £ V v , in fact, v £ K. Thus putting 7 (u) = 00 for all u £ I\ 
satisfies C, and is also consistent with ( 3 ); and with ( 4 ) by the minimality of 
to. Furthermore, it doesn’t violate A, and is consistent with B by ( 2 ). This 
contradicts the maximality of V'. 

We may thus assume henceforth that there exists u £ K such that 

to. ^ 7'(F(m)) . ( 5 ) 

Case 2 . There exist u £ K and v £ F(u) with 7 (v) = 00, such that for 
every w £ F(v), either 7 (w) Y or w £V v with Y( w ) Y m - Putting 7 (u) = 00 
is clearly consistent with C, and ( 3 ); and it doesn’t violate A. In view of ( 2 ), 
also B is satisfied. This contradicts the maximality of V' . So we may assume 
that 

Vu £ I\ and Vu £ F(u) with j(v) = 00 , 

3 w £ F(v) (with 7 (w) = to. or w £ V v with Y( w ) = m ) ■ 

We subdivide this into the following two cases: 

3 u £ K such that \/v £ F(u) with j(v) = 00 , 

3 u> £ F(v) with y(w) = m , (6) 



or 



Vu £ I\ and Vu £ F(u) with y(v) — 00 , 

3 no w £ C\V', but 3 w £ F(v) fl V„ with 7 \w) = to. . ( 7 ) 

Case 3 . (6) holds. We repeat that for any u £ I\, since 7 '(u) = to, u has no 

follower with 7- value to. Suppose that there exists y £ with 7 (y) = to. 

Then by ( 2 ), there exists v £ F(u ) with y(v) = to, contradicting y'(u) = to. Thus 
putting 7 (u) = to. is consistent with A. It is also consistent with ( 3 ): putting 
7 (u) = to. could presumably increase y'(y) for some y £ F _1 (u) fl V v , and thus 
upset ( 3 ) for the value 7 (z) = 00 of some grandparent 2 = F~ 1 (F~ 1 (y)) of y. 
Now by ( 4 ), 7 '(u) > 7 '(z). If indeed 7 '(y) increased, then for the new value 
we have 7 '(y) > 7 '(u), so 7 '(y) > 7 '(z) and 7 {z) = 00 remains unaffected. 
Consistency with C thus follows from ( 3 ) which becomes C when u is labeled 
to. Since y £ F -1 (u) => 7 (y) Y as we saw a t the beginning of this case, 
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the potential adverse effect on any grandparent z of y considered above, cannot 
happen. 

We now show that also B holds. Suppose first that F(u) C V' . For every 
v £ F(u) for which 7(4;) > m, there exists w £ F(v) with 7(44;) = m. This 
follows from A if 'y(v) < oo, and from (6) if 7(1') = 00. It remains to define c(u ) 
sufficiently large so that c(w) < c(u). This will be done below. 

In view of the minimality of m and by (5), the second possibility is that for 
every v £ F(u) for which v £ V v , we have 7 , (u) > m. For every such v there 
exists w £ F(v) with 7 (w) = m by the definition of mex. Again we have to 
define c(u) sufficiently large to satisfy c(u>) < c(u). 

Let S = {v £ F(u) : 7(4;) > m} U {v £ F(u) : v £ V v , 7 / (u) > m}. We 
have just seen that for every v £ S there is u? £ F(y) with 7 (w) = m. Put 
T = {w £ F(v) : v £ S, 7 (w) = m}. Let c(u) be the smallest ordinal > c(w) for 
all w £ T . Then also B is satisfied. This contradicts the presumed maximality 
of V'. 

Note that the case F(u) C V \ V°° satisfies (6) vacuously, and so is also 
included in the present case. 

Case 4. (7) holds. If (7) holds nonvacuously, then as in Case 1, putting 
7(14) = 00 for all a £ K is consistent with C, (3) and the other conditions. This 
contradicts again the maximality of V’. Hence K = 0, and so also V v = 0. 

Whenever 7 exists on a digraph, finite or infinite, it exists there uniquely. 
See [OJ. Finally, if b is the local bound of u £ V, then 7 '(it) < b by Lemma 2. 
Hence if u £ V? , then 7 (u) < b by A. 

For proving the last statement of the theorem, consider the digraph G which 
consists of a vertex u, and F(u) = {uo,'Ui, . . .}, where, for all i £ Z°, Uj is the 
top vertex of a Nim-heap of size i, so 7(44$) = i (i > 0). It is clear that G is path- 
finite but not locally path-bounded. Also 7 (F(u)) = {0, 1 , . . .}, so 7(44) cannot 
assume any finite value. In fact, 7(44) = ui, where u> is the smallest transfinite 

ordinal bigger than all the natural numbers. If, however, F(u,i) = 44* 1, and 

F(u) = {440, 44 i, . . .} as above, then again G is path-finite but not locally path- 
bounded, yet 7(44) = 2. ■ 

4 An Example 

We specify below a locally path-bounded digraph G = ( V , E) on some of whose 
vertices we place a finite number of tokens. A move consists of selecting a token 
and moving it to a follower. Multiple occupancy of vertices is permitted. The 
player first unable to move loses, and the opponent wins. If there is no last move, 
the outcome is a draw. 

For any r £ Z°, a Nim-heap of size r is a digraph with vertices uo,. . . ,u r 
and edges ( Uj,Ui ) for all 0 < i < j < r. In depicting G (Fig. 2), we use the 
convention that bold lines and the vertices they connect constitute a Nim-heap, 
of which only adjacent (bold) edges are shown, to avoid cluttering the drawing. 
Thin lines denote ordinary edges. 
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All the horizontal lines are thin, and each vertex Ui on this horizontal line 
connects via a vertical thin edge to a Nim-heap Gi pointing downwards, of 
size |_(4i + 8)/3j, i £ Z°. From Ui there also emanates a Nim-heap Hj of size 
j = [(i + 2)/3j pointing upward. Each Gi has a back edge to its top vertex 
forming a cycle of length i + 1. Thus Go has a loop at the top of its Nim-heap 
(of size 2). There is an additional back edge to the vertex u t on the horizontal 
line, forming a cycle of length i + 3. 




Fig. 2. The tail-end of a locally path-bounded digraph. 



From any vertex u on the horizontal line there is a longest path, via Gi, of 
length [(4 i + 11)/3J, and the other vertices have shorter maximal length. Thus 
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G is locally path-bounded. But it is not path-bounded, since i can be arbitrarily 
large. 

Sample Problem. Compute an optimal strategy for the 5-token game placed 
on the 5 starred vertices of G. 

To solve this problem, we introduce the generalized Nim-sum (Id, Id, 0)- 
For any nonnegative integer h. we write h = X^>o ^*2* for the binary encoding 
of h ( h l G {0,1}). If a and b are nonnegative integers, then their Nim-sum 
a © b = c, also called exclusive or, X OR, or addition over GF(2), is defined by 
d = a? + P (mod 2), d G {0,1} (i > 0). 

The Generalized Nim-sum of a nonnegative integer a and oo (L), for any finite 
subset L C Z°, is defined by a © oo (L) = oo (L) © a = oo (L © a), where L © a = 
{i © a:£ G L}. The Generalized Nim-sum of oo(Li) and oo(L 2 ), for any finite 
subsets L\, L 2 of Z°, is defined by oo(Li) © oo(L 2 ) = 00 (P 2 ) © oo(Li) = oo(0). 
Clearly the Generalized Nim-sum is associative and a © a = 0 for every a. 

Given any finite or infinite game F, we say informally that a P-position is any 
position u from which the Previous player can force a win, that is, the opponent 
of the player moving from u. An N-position is any position v from which the 
Next player can force a win, that is, the player who moves from v. A D-position 
is any position u from which neither player can force a win, but has a nonlosing 
next move. The set of all P-, N- and D-positions is denoted by V, A f and V 
respectively. 

For any finite multiset u = (u\, . . . ,u n ) of vertices of G on which tokens 
reside, one token on each Ui , we then have the result |3J: 

Proposition. The P-, N- and D-labels of u = {u\,...,u n ) in any locally 
path-bounded digraph G are given by 

V = {u G V: a(u ) = 0}, V = {u G V: a{u) = oo(K), 0 £ K} 

Af = {u G V: 0 < <j(u) < 00 } U {u G V: a(u ) = oo(K), 0 G K} . ■ 

We are now ready to solve the above problem, by observing that the symbols 
appearing on Fig. 2 are the 7 -values of G. Simply check that they satisfy the 
conditions of Definition 2. In particular, B of Definition 2 is satisfied if every 
vertex on the horizontal line with 7 -value < 00 gets a counter-value between 
uj and tv2, and every vertex in the Nim-heaps with 7 -value < 00 is assigned a 
counter value < w, which is clearly feasible. 

For the 5 starred vertices we then have 1©3©2©4© oo(0, 1, 2, 3, 4) = 
4 © oo(0, 1, 2, 3,4) = oo(4,5,6,7,0), which contains 0, hence the position is in 
Af. Thus the player moving from this position can win by going to a position 
of Nim-sum 0, namely, pushing the token on the infinity label to 4. Indeed the 
resulting Nim-sum is 1©3©2©4©4 = 0. 

We remark that any tokens on two vertices with 7 - value 00 is a draw position, 
no matter where the other tokens are, if any. Also note that for realizing a win 
in this game we do not really need a counter function. 
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Epilogue 

We have defined locally path-bounded digraphs, and shown that the generalized 
Sprague-Grundy function 7 exists on such digraphs with finite, though not nec- 
essarily bounded, values. Of course local path-boundedness is only a sufficient 
condition for the existence of 7 . Any finite or infinite digraph without a leaf, 
satisfies trivially 7 (u) = 00 for all its vertices u. 

A large part of combinatorial game theory is concerned, however, with di- 
graphs which do have leaves. If we exclude digraphs without leaves, then the 
condition of local boundedness is, in a sense, best possible, as stated in the last 
part of Theorem 1. 
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Abstract. In this paper, we explain why Go is hard to be programmed. 
Since the strategy of the game is closely related to the concept of alive- 
dead group, it is plainly necessary to analyze this concept. For this a 
mathematical model is proposed. Then we turn our research to Tsume- 
Go problems in which one of the players has always a unique good move 
and the other has always only two good moves available to choose from. 
We show that this kind of problems are NP-compIete. 



1 Introduction 

The creation of a program that plays the game of Go, even at a medium level, is 
known to be difficult. Our aim in this paper is to give a reasonable explanation 
to this problem by analyzing the fundamentals of Go, more precisely, the concept 
of living group. What territories are, is a problem that can be reduced eventually 
to the life and death problem (tsume-Go) of certain groups placed on the board. 
Strong players always look to groups to find out if they are thick or weak, that 
is if they can live easily or not. In fact, the core of the strategy of the game is 
the understanding of what is a living or a dead group. 

The Go-programmer dream is to find a good pruning function that would 
reduce a lot the searching tree in the emerging problems during the game, prob- 
lems that can be reduced, as we said before, to life and death problems. What we 
will show is that even if one has an exceptionally good pruning function, there 
are positions in which the complexity of the search is NP-complete. 

In the past, results on the complexity of Go were obtained by Lichtenstein and 
Sipser [I|, who have constructed Go positions without kos which are PSPACE- 
hard. Using kos, Robson P] has shown Go is EXPTIME-complete. On the other 
hand, Morris (2| showed that playing sums of relatively simple combinatorial 
games is NP-hard and then, Yedwab P] and Moews pj showed that even a sum 
of games of the form o||6|c is NP-hard. For further references see Berlekamp pj. 

Let Q be a group and T{Q) a tree constructed recursively as follows: the root 
is the (initial) position to which Q belongs, and the children of any node v are 
all the positions obtained throw a legal move from v. We also denote by (v) 
a pruning function for T(t/), that is, a set containing some descendents of the 
node v. Let 7 &(G) be the subtree generated by <P(v) with the same root as T(Q) 
and then, recursively, W(v) is the set of children of any node v. 

Let Gnp be the set of all groups Q for which there is a pruning function 
^(u), computable in polynomial time, that satisfies |7'(r')| = 1 always for one of 
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the players and |<Z'(u)| = 2 for the other for all nodes v of Ty(G) and one can 
determine if Q can be removed or not from the board only by scanning in Tq, (G) 
at a depth which is polynomial of the size of the board. 

Let us consider the problem: 

(GNP) Given Q € Gnp, decide whether Q is a dead group. 

Our main result is: 

Theorem 1 . The problem (GNP) is NP — complete. 

In Sect. 1 we present a mathematical model of the game of Go, in the second 
we give an algorithm that checks if a group is alive or not, while in the last we 
prove Theorem 1. 



2 A Mathematical Model of Go without the Rule of Ko 

Let T , be a nonempty finite set, which we call the board, and AT C T x T a 
symmetric binary relation on T, named the neighboring relation. If ( x , y) € Af 
we say that x and y are neighbors and write xAfy. Let S = { B , W, 0} the set of 
colors ( B stands for Black, W for White and 0 for an empty) and S* = { B , W}, 
the set of players. If x € S* we denote by x the other element of S* . 

Let T be the set of configurations which are defined to be the set of all 
functions from T to S. (For example, the fact that f(x ) = W means that in x 
a white stone is placed). For each configuration /, it is natural to associate a 
binary relation Aff C T x T. Thus, we write xAf/y iff xAf y and f(x) = f(y). 
It is easy to see that the reflexive and transitive closure of Aff is an equivalence 
relation and its classes of equivalence will be called groups. We write by GJ the 
group that contains x. The color of the group GJ is Color (QJ) = f(x). 

For any Black or White group we define the set of liberties by 

C X f = {y el: f(y) = 0 and yAf z for some 2 € Gf} , 

that is the empty neighbors of QJ. The number of liberties of Gf is the cardinal 
of CJ and will be denoted by \\Gf\\- 

During the game, any configuration is paired with a player whose turn is 
make the next move, thus we need to introduce the set of positions which is 
defined to be V = T x S* . The fact that p = (/, o) e V means that in the 
configuration / is a 1 s turn to move and for a group that contains x in position 
p, the notation Gp will be used rather than Gf- 

Now we are ready to define the moving function, which we denote by <p: V x 
T* — > V. Let T* = T U {pass}. If p, p' € V and x € T* then p' = cj>{p,x) = 
0((/,«) , x) will mean that in the position p the player a has placed his stone in 
x and the outcome is the position p’ . 
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There are two cases. 

1. x = pass or f(x) ^ 0. We define 

= ( f,a ) , 



which means that if one of the players moves pass then we get the same config- 
uration with the other player’s turn to move and playing on an occupied place 
is the same as playing pass. 

Let us note that this also says that to play pass is allowed in the initial 
position po = (/o, B) 1 where fo(x) = 0 for any x £ T. 



2. f{x) = 0. In this case, we may obtain or not a different configuration if 
the move captures or not some stones. 

Let 



9(z) 



f(z ) if z ^ x 

a if z = x 



be the configuration obtained by placing a stone of color a on x. Let 



Capture(p,x) = {Gg-. g(z) = a, zAfx and ||£||| = 0} 

be the groups captured by the move, that is the opposite color groups which are 
neighbors to x and have no liberties. 

If Capture(p,x) ^ 0, then let 



h{z) 



g(z) if z ^ Capture (p, x) 

0 if z £ Capture(p,x) 



be the configuration obtained from g after removing the groups with no liberties. 
Then, since the suicide is forbidden, we define: 



<£((/, a), a:) 



(f,a) 


if 


Captiire(p,x) = 0 and 


11 ^ 11=0 


( h , a) 


if 


Capture(p,x ) 7 ^ 0 




{9, a) 


if 


Capture(p,x ) = 0 and 


\\Q X g II ^0 



Given pi, . .. ,p n G V, let ^({pi, ... ,p n }) be the set of all positions that can 
be obtained from pi, . . . ,p„ by playing any possible move. Thus, : 2 V — > 2 V 
and 

n 

^({pi,... ,p„}) = [J (J </)(pi,x) , 

i — 1 x£T* 

where 2 V denotes the set of all subsets of V. In particular, ({po}) is the set 

of all positions that can be obtained from the initial position po in exactly k 

k 

moves. If the position p' is obtained from p in k moves we write shortly p p' . 

The following theorem shows that the ” history” that created a position plays 
no role in the study of a legal position of the game. 
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Theorem 2. Let G(po) be the set of all positions generated by po- Then, p = 
(/, a) G G(p 0 ) iff \\Gp || 0 for any x € T with f(x) ^ 0. 

Proof. Clearly p 0 has the property that ||t/p 0 || yf 0 for any ieT with f 0 {x ) y^ 
0. Suppose now that for the position p = (/, a) we have that \\G p \\ y^ 0 for 

any x G T with f(x) y^ 0. Then, for any position p' with p =>• p', we have 
that \\Gf>\\ y^ 0 because if' removes the groups without liberties. By induction, 
we obtain that for any p G G(po) we have that \\Gf\\ y^ 0 for any x G T with 
f( x ) ^ 0. 

To prove the other implication, let p = (/, o) be a position and let xi , . . . , x fTI 
be an ordering of T such that the sequence f{x i), . . . , /(a; |r| ) has the form 

B, W,X , . , X, 0, . , 0 , 

l colors m colors n colors 



k 

for some l,m,n > 0, l + m + n = |T|, and X G S* . Then po => p through 
Xi , . . . , x |T| in fc < l + 2m + 1 steps (the step from IT to X may need a pass 
move, from X to X requires a pass move and from X to a may need a pass 
move). Since \\Gf\\ y^ 0 for any x G T with f(x) y^ 0 it follows that the same 
is true for all the intermediary positions between po and p, therefore all these 
positions are correctly constructed (without suicidal moves). Although is not 
necessary, note that the way in which p is obtained from po takes a minimum 
number of steps. This completes the proof of the theorem. 

3 The Life and Death Algorithm 

By the definition of the function move, one can see that if the group Gf has only 
one liberty (i.e. \\Gf\\ = 1) where p = (/, a) and Color{Gf) = a then the group 
Gp can be captured by taking that liberty. Let then 

V 0 = {G x p : P G G(p 0 ), p = (/, a), Color(G x ) = a and \\G x p \\ = 1} . 

be the set of groups which can be captured in one move. For any i > 1, we define 
recursively the sets V, of groups that can be captured in at most i+1 moves. For 
this, let us note that Go players have different points of view on their groups. 
Thus, the player a who has a group Gf, will convince himself that the group is 
dead if for all his possible moves he will get a dead group, and the opponent a 
needs to find only one move to kill it. This leads to the following definition 



Vi = Vi-i U {G p : p G G(po), p=(f,a ), Color(G x ) = a 

and G p ' G Z>i_i for some p' G ^({p})} 

U {G x '■ p G G(po), p = {f, a), Color{G p ) = a 

and G p ' G Dj-i for every p G ^({p})} • 
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Now, we are ready to define the set of all dead groups by 

V = |J Vi 

ie N 

and the set of living groups by 

V={a*:peG(po), Q x p iV) . 

Since V is finite and the set of groups in a certain position is also finite, it 
follows that T> is finite. This implies that there is an integer, call it a, for which 
T> a = V a+ \, therefore, by the definition of V it follows that V = V a . For a we 
may take the trivial upper bound 2|T|3l T L 

The following is an algorithm that analyses if a group can be killed or not. 

procedure Can_kill{p : G(po), x : T, depth : N) : boolean 

var w:boolean; y € 2 G h’o) ■ 

begin 

if Gp € T > o then return TRUE; 
if depth > a then return FALSE; 

if (p = ( f,a ) AND Color(Gp) 7 ^ a) 
w =FALSE; y = ^(p); 
while (y £ 0 AND NOT w) 

P 1 ey; T = T\{p'}; 
w = w OR Can_kill{p' , x, depth + 1); 
endwhile 
return w; 

else 

w =TRUE; y = ^(p)- 
while (T ^ 0 AND w ) 

p' e y-, y = y \ {/}; 

w = w AND C an _kill (p 1 ,x, depth + 1); 

endwhile 
return w; 
endif 
end. 

The next theorem proves that this algorithm is correct. 

Theorem 3. G p G V iff CanJcill(p,x,0) = TRUE. 

Proof. Since trivially |G(po)| < 2-3l r l, the algorithm stops eventually. By induc- 
tion on i we immediately see that Qf € V t implies that CanJkill{p, x, 0) = TRUE, 
and by the definition of V it follows that the same is true for V. 
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Conversely, let CanJiill n (p,x, 0) be the above algorithm with a replaced by 
n. It is not difficult to prove by induction on n that Cari-kill n (p,x, 0) = TRUE 
implies Q* £ T> n for any n > 0. This completes the proof of the theorem. 

Let us remark that to check that £ Vq takes 0(|T|) steps. The calculation 
of needs 0(|T|) steps, too. We also mention that if the depth of the calculation 
is d then 0(d ■ \T\) is the memory space needed. 

We conclude this section with some observations on kos. So far no restrictions 
on the cycling of positions was imposed in our mathematical model. Thus, since 
repetitions were allowed the set V of living groups is larger than in an usual 
game with the rule of ko. 

For example, in Fig. 1 and 2, if White moves then his bigger groups are 
alive, although in a normal game the first is dead while the second is in an 
undetermined position and special extra rules apply. Since our main object in 
this work is to prove the complexity of a certain set of groups in which, as we 
will next see, no kos are involved, we don’t go in further detail with the ko rule. 




Fig. 1. A 2-kos position Fig. 2. In a 3-kos position 

in which White lives. both Black and White live. 



4 Proof of Theorem m 

The proof has two parts. Firstly we show that GNP is a NP— problem and 
secondly we reduce the 3 SAT problem to the life and death problem of certain 
groups from Gnp- The theorem will then follow, since 3 SAT is NP complete, 
result proved by Cook in 1971. 

We begin by showing that GNP belongs to NP. For this, let us consider a 
nondeterministic Turing machine which performs the following algorithm: 

input Qp £ Gnp 

guess pi,..., p|r| with p = pi 

check that Pi+i £ & (pi ) for 2 < i < \T\ — 1 

if {Color(Q x ) = a AND \<P({f,a})\ = 2) 

check that Q x (ji Vo for every pi 
if this holds then return FALSE 
else // Color(Q x ) = a AND \\P({f,a}) | = 2 

check that Qp £ Vo for some p; 
if this holds then return TRUE 

endif 



end 
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The computation of this algorithm takes at most 0(|T| 2 ) steps. 

For the second part of the proof, we show how the 3 SAT problem can be 
reduced to the life and death problem of certain groups. We first state the 3 SAT 
problem. Let us consider the formula F = Ci A • • • A c n where Cj = u tl V Ui 2 V Ui 3 , 
u ik £ {’Ui k ,Ui k } for 1 < i < n and 1 < j < n and U = {u-\ , . . . , u m } is the set of 
logical variables that appear in F. Then the 3 SAT problem is: 

(3 SAT) Decide whether F evaluates to true. 

Let p(i) and q(i) be the number of appearances in F of u, and ui respectively. 
For each i, (1 < i < m) we construct the diagrams A(i) (see Fig. 3). 
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G(i) 



»«(*) 



' J/2 Vl 



G(i) 



Fig. 3. Diagram A(z). 



Remark 1. Suppose that in diagram A(z), White moves. Then: 

1. ||G(i)ll = 2, 110)11 =2 and ||0| = 3. 

2. If White plays A and B then Hi is dead. 
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3. If White plays A and Black plays C then G(i ) and Hi are alive and G(i) 
is dead. 

4. If White plays B and Black plays D then G(i ) and Hi are alive and G(i ) 
is dead. 

For any Cj = u ^ V u, 2 V Ui 3 , 1 < j < n, we associate a diagram B(j) (see 
Fig. 4). For any k, 1 < k < 3, this diagram is connected with that from diagram 
A (?'fc) as follows. If u ik = Ui k then G(ik) is G(ik), the connection being through 
Xi for some l, 1 < l < p(ik)- If Ui k = ui k then G{ik) is G{ik), the connection 
being through yi for some l , 1 < l < q{ik )• 

Remark 2. In diagram B(j) the following hold true: 

1. ||G'(i/)|| = 2 and ||L(ij)|| = 3 for 1 < l < 3. 

2. The group K{j) is alive iff the group G(ii) alive for some l, 1 < l < 3. 

Finally, all the previous diagrams are 
interconnected also with diagram C 
(see Fig. 5). Thus, all the groups Hi, 
are connected to those from diagram 
A(z) through Zi for 1 < i < m. Sim- 
ilarly, all the groups Kj , are con- 
nected to those from diagram B(j) 
through Wj for 1 < j < n. 

Remark 3. Suppose that in dia- 
gram C, White moves. Then we have: 

1. P|| = ||rt|| = 2. 

2. The group A is alive iff either 
Hi is dead for some i, 1 < i < m or 
Kj is alive for all j, 1 < j < n. 

The construction of all these di- 
agrams needs 0(n) steps, so the re- 
duction is made in a polynomial num- 
ber of steps. 

Remark 4. (Easy life). For White, 
an easy way to live with A is by mov- 
ing A (or B) in diagram A(i) and 
Black plays poorly by failing to re- 
spond with C (or D) in the same di- 
agram. This allows White to occupy 
both A and B and then, by 2 of Remark 1 Hi is dead, so A is alive by 2 of Remark 
3. 

Let us now see that the 3 SAT problem is equivalent to the life and death 
problem of the group A from diagram C. In the position given by the above 
construction suppose that White moves. We denote by v(ui) € {TRUE, FALSE} 
any assignation of the logical variables Ui, 1 < i < m. The correspondence 
between our construction and v[v,i) is given by the convention: 
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Fig. 4. Diagram B(j). 
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Fig. 5. Diagram C. 



I TRUE if G(i ) is alive 

V[Ui) = < — . ( 1 ) 

| FALSE if G(i) is alive 

We claim that any solution of the 3 SAT problem makes A alive and the 
converse is also true. 

Let v('Ui) £ {TRUE, FALSE} for 1 < i < m be an assignation which validates 
F. If v(ui) = TRUE then White moves in A in diagram A(l) so Black is forced 
to respond C (otherwise A lives easily, confer Remark 4). If v(ui) = FALSE then 
White moves in B in diagram A(l) so Black is forced to respond D (otherwise A 
lives easily, confer Remark 4). White will then turn to the subsequent diagrams 
for i = 2 and play in the same way, giving Black only unique-choice 
moves. Playing in this manner yields 

v(ui) = TRUE iff G is alive for i = 1, . . . , m . (2) 

Let us also note that White moves make K ? alive for j — 1 , ,n. For if Kj 
is dead for some j, then by 2 of Remark 2 the groups G(ii) from diagram B(j) 
are all dead. By Q it follows that v(ui t ) =FALSE for l = 1,2,3, so Cj is FALSE 
which implies that F is FALSE, contradicting our assumption. Since Kj are alive 
for j = 1, . . . , n, by 2 of Remark 3 it follows that A is alive. 
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Conversely, suppose that A is alive. As we observed before in Remark 4, A 
can live easily if Black plays poorly. Suppose that Black plays well. Then by 
2 of Remark 3, the only way for White to live with A is to live with all Kj 
for j = 1 , . . . ,n. By 2 of Remark 2, Kj lives iff G{ii) lives for some l = 1,2, 3. 
But G{ji) lives iff White plays in diagram A[i{) in A if G[ii) = G[i{) and B 
if G(ii) = G(ii). Therefore, by the construction of B(j) and 3, 4 of Remark 1, 
the only moves for White which make A alive are A or B in diagram A{i) for 
i = 1, . . . , m and since Black plays well, he is forced to respond with C or D. Let 
us also note that the order of diagrams A{i) in which White makes the moves is 
not important, since the result (after his m moves) is the same. 

With the convention m White moves give values to the variables Ui for 
i = 1, . . . ,m. Now, because Kj lives for all y = 1 , n, it means that Cj is true 
for j = 1, . . . , n, which implies that F is TRUE. This concludes the proof of our 
claim. 

It remains to show that A is in G'np- In the above discussion, we saw that if 
White wants to live with A he has to move A or B in diagram A{i) for i = 1, . . . , m 
and we observed that the order of i' s is not important. Moreover, trying to kill A, 
Black had unique choice-responses. Thus, a pruning function with the required 
properties exists. Note also that to check that A lives after White has played 
his first m moves takes only 0{n) steps (in every diagram B(j) it is enough to 
check if one of the groups G(ii) for l = 1,2,3 has 2 liberties), and the theorem 
is proved. 
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Abstract. In evaluating a local position in go, players want to know its current 
territorial count and the value of a local play. Many go positions are combina- 
torial games; the mean value of the game corresponds to the count and its tem- 
perature corresponds to the value of the play. Thermography finds the mean 
value and temperature of a combinatorial game. However, go positions often 
include kos, repetitive positions which are not classical combinatorial games. 
Thermography has been generalized to include positions containing a single ko. 
This paper extends thermography further to include positions with multiple 
kos. It also introduces a method for pruning redundant branches of the game 
tree. 

Keywords: go, ko. multiple kos, thermography 



1 Introduction 

Thermography finds the mean values and temperatures of combinatorial games. Clas- 
sical thermography fails for kos, which have cyclical game graphs. Berlekamp [1] 
breaks the cycle by allowing one player, the komaster, to win the ko. But when more 
than one ko is active at one time, no player can win them all, as a rule. 

The method presented here converts the game graph to a pair of and/or trees. As 
long as the ko rules prevent the ko from continuing interminably, the trees are finite. 
Then we may apply thermography to the trees. 



1.1 Definition of the Thermograph 

The thermograph of a combinatorial game, G, shows, for each temperature, t, the Left 
and Right scores of G cooled by t [2], (By convention, in go Black is Left and White 
is Right.) The Left (Right) score of a game is the result of alternating minmax play if 
player Left (Right) plays first. The set of all (Left score (Right score), temperature) 
pairs is the Left (Right) wall of the thermograph. A game is cooled by t by imposing a 
tax equal to t on each play. Instead of cooling we shall make use of the relation be- 
tween the thermograph and minmax play in a universal enriched environment (UEE) 
[ 1 ]. 

A UEE is a sum of simple games (called switches ) of the form { v|— v } , where Black 
can play to a position worth v points and White can play to a position worth -v points 
(for Black). (Switches { 0|0 } are familiar to go players. They call them dame.) The 

H.J. van den Herik, H. Iida (Eds.): CG'98, LNCS 1558, pp. 232-251. 1999. 

Springer- Verlag Berlin Heidelberg 1999 




Extended Thermography for Multiple Kos in Go 



233 



temperature of { v|— v } is v. The temperature, t, of the UEE is the temperature of its 
hottest component. To construct a UEE, U nt , let 

n = l/lcm{ 1, 2, n } . (1) 

Let U nt consist of (2u+ 1 ) switches at each temperature 0, n , 2. n , t. Let n be large 
enough that the Left score of U nt is 

LS(U„) = t/2 , (2) 

the Right score is 

RS(U nt ) = -t/2 , (3) 

the Left value of the thermograph of game G at temperature t is 

L(G t ) = LS(G+ U nt ) - LS(U nt ) , (4) 

and correspondingly the Right value is 

R(G,) = RS(G+U nt )-RS(U nt ). (5) 

Defining the walls of the thermograph by minmax considerations allows us to gener- 
alize thermography to include multiple kos. 



1.2 Kos and Thermographs 

Kos are go positions which potentially repeat. Since repetition of board positions in 
go may lead to hung games, different versions of ko rules prevent some or all repeti- 
tions. For any version of ko rules we may draw a thermograph for a position con- 
taining one or more kos as long as the rules prohibit a hung game. 

A board position in which at least one of the options is prohibited by the ko rules is 
called ko-banned [1], In a ko-banned position minmax play may be in the environ- 
ment. The lines of a classical thermograph reflect only alternating local plays, but the 
lines for ko thermographs must be able to reflect plays in the environment as well. Let 
us call a thermograph which reflects plays in the environment an extended thermo- 
graph (cf. [2]). 



2 Deriving Thermographs 

By convention the axes of a thermograph are rotated counterclockwise 90°. The tem- 
perature, t, is plotted on the vertical axis with positive values above the origin. The 
score for Left (Black), v, is plotted on the horizontal axis with positive values to the 
left of the origin. Each thermographic line represents a mast. If a go position is termi- 
nal, its thermograph is a vertical mast with the equation, v = m, the local score. In the 
extended thermograph, if a sequence of play from the original position reaches a ter- 
minal position with a score of m, the equation of the corresponding thermographic 
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line is v = m + (yv-b) t, where w is the number of local plays in the sequence by 
White, and b is the number of local plays in the sequence by Black. 




Fig. 1 . Scaffolds for game {4 | 0} 

In the combinatorial game {4 | 0 } , Black can play to a position worth 4 points and 
White can play to a position worth 0. Figure 1 shows the thermographic lines which 
represent plays in that game, line ( 1 ) when Black plays first and line (2) when White 
plays first. Line (1) is called the Left scaffold; line (2) is the Right scaffold [1 ]. Their 
equations are 



= 4 -t 


(1) 


v = t . 


(2) 



The scaffolds intersect at temperature t = 2. When t < 2, each player will prefer to 
play in the game rather than in the environment, and the scaffolds represent the Right 
and Left walls at t. 

But when t> 2, minmax play for each player is in the environment. Minmax play 
at t is not sufficient to determine the Right and Left walls. When t = 3, for instance, it 
only tells us that the Left wall is greater than 1 and the Right wall is less than 3. 

However, minmax play through a range of temperatures, in this case a drop from 
temperature 3 to 2, will establish the Right and Left walls at temperature 3. Since 
each player plays in the environment, the gain of the first player between those tem- 
peratures is the same amount as it would be in the environment alone, without the 
game. The Left or Right wall at temperatures greater than 2 is the same as it is at tem- 
perature 2. 

When t 2 the Left and Right walls coincide at v = 2. The coincident walls form a 
mast with equation 



v = 2 . 



( 3 ) 
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The temperature of the game is 2 and its mean value is 2. Figure 2 shows the thermo- 
graph of { 4 | 0 } . 




Fig. 2. Thermograph of game { 4 | 0 } 



Note: 

• Plays in the environment are not sufficient to determine thermographic values. 

• The value of a simple mast, where both players prefer to play in the environment, 

is determined at its base. 

(Some kos have more complicated masts. Berlekamp [1] covers this important 
topic.) 

In this case, which is typical, the intersection of the Left and Right scaffolds de- 
termines the mast. It may also happen that the walls do not intersect, even at the low- 
est temperature. An example is the game {-6 | 8}. Then the game is terminal, and its 
value may be different under different rules. Its mast value is just its terminal value 
under the rules in use. 

(As a combinatorial game, the value of {—6 | 8} is 0. Go players recognize this 
game as a seki, and give it a territorial value of 0. Territory counting, as under Japa- 
nese rules, typically coincides with the value of such a game in combinatorial game 
theory.) 

In theory we can determine any thermograph by finding the Left and Right values 
at each temperature for each line of play and eliminating the suboptimal ones. When 
minmax play for each player is in the environment, however, we need not consider all 
possible line of play. We may simply assume that each player plays in the environ- 
ment until it is wrong to do so. 

Since simple masts are determined at their bases, a typical method of drawing 
thermographs is to derive them bottom up, from the thermographs of their followers. 
That approach will not work with kos, however, unless modified, since ko positions 
ultimately follow themselves. 
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2.1 Thermographs for And/Or Trees 

In combinatorial game trees Left branches represent options for Left and Right 
branches represent options for Right. By contrast, in and/or game trees the branches 
represent options for the player with the move. This extension of thermography util- 
izes and/or trees. To convert a combinatorial game tree to an and/or game tree, we 
make plays in the environment explicit. Figure 3 shows the combinatorial game tree 
for the game { 2 1 1 0 | —6 } on the left, and its corresponding and/or game trees on the 
right. 



A 



2 B 



0 -6 



A 




Fig. 3. Two kinds of game tree 

There are actually two trees on the right, depending on who plays first from the 
root. The Left tree represents Black’s play to a position worth 2 points. In the Right 
tree White can play to B. From B Black can play to 0 or play in the environment. The 
‘e’ indicates a play in the environment, which leaves the game node the same, but 
changes who has the move. From B, White can play to -6. 

This and/or tree representation leaves out many lines of play. It ignores inferior 
plays in the environment, but they are dominated. It also ignores sequences of plays 
in the environment. After White plays to B and Black plays in the environment, it 
might be correct for White to play in the environment, too. In fact, if Black’s play 
was correct. White should almost certainly play in the environment. 

But in that case, there is a mast at B at the current temperature. Such lines of play 
give no thermographic information. That is why we can prune them. 

Black’s environmental option establishes the fact that there is a mast at B. The Left 
and Right options at B carry all the information needed to find that mast. That fact 
leads to our basic principle of pruning: 

Ignore environmental plays at established masts. 

The first ‘e’ establishes the mast at B. No further ‘e’s are allowed from B. We also 
disallow environmental plays from the root. They would also be meaningless, since 
either player can play from the root. 

How can we construct the thermograph of the and/or tree? The process is essen- 
tially the same as with combinatorial game trees. We work bottom up from the fol- 
lowers. 
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To reach -6 from the root there are 2 plays in the game (local plays) by White, 
none by Black. That gives us the Right scaffold 



v = -6 + 2 1. (1) 

To reach 0 there is one local Black play and one local White play, which gives us the 
Left scaffold 



v = 0 . (2) 

The ‘e’ tells us that there is a mast at B. Since it and 0 are both Left options from B, 
we know that we use the Left wall of the thermograph of B in deriving the final ther- 
mograph. The mast of B has the equation 

v = -3 + t . (3) 

Of course, the mast of B would be vertical in its own thermograph, but not in the 
thermograph of A. Figure 4 shows the construction of the Left wall of B, which is 
indicated by thickness. 




Fig. 4. Left wall of B 



Now we can construct the thermograph for A. Its Right scaffold is the Left wall of 
B, and its Left scaffold is the line 

v — 2 — t . (4) 

Figure 5 shows the construction of the thermograph of A. The thick lines form the 
thermograph. 
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Fig. 5. Thermograph of { 2 || 0 | —6 } 

The scaffolds intersect at a temperature of 2 and a score of 0, the temperature and 
mean value of the game. The mast is the line 

v = 0 . (5) 



Playing through a Node. If White plays first from A at temperature 2, Black’s 
correct response is to play to 0 immediately. White plays through B. In go 
parlance, White’s play is sente. If we knew that White played through B, we 
could dispense with the calculation of the mast of B. The thermograph does 
not depend on it. 



2.2 Ko Thermographs 

Figure 6 shows a simple ko. Since the Black stone below ‘a’ has only one liberty, 
White can take it. But after White captures the Black stone, the capturing stone has 
only one liberty, and Black could take it back, if the ko rules did not prohibit the re- 
capture. 




Fig. 6. Ko 

If Black captures the two White stones, the score is 5; if White captures the Black 
stone and later fills, the score is -4. Figure 7 shows the game graph of this ko, on the 
left. The original position is labeled ‘A’ and the position after White takes is labeled 
‘B’. The U-curve between A and B indicates the ko. 
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Fig. 7. A simple ko graph and its trees 



To draw the thermograph of A we derive Left and Right and/or game trees from 
the game graph and the ko rules. In Figure 7 the trees for this graph are on the right. 
The Prolog program in the appendix converts a game graph to its and/or trees. 

In the Right tree White plays from A to #B. The “#’ indicates that there is a ko ban 
at B. A ko-banned node does not form a mast (unless it is terminal), because a play in 
the environment removes the ko ban, and that changes the game. (If a position is ter- 
minal, there is no environment left, not even a dame.) 

Next Black plays in the environment. Now White has two options, one to -4, and 
another environmental play. Since the first environmental play removed the ko ban, 
the second one establishes a mast at B. Next Black plays to #A, where the tree stops. 

That seems unusual, to say the least. But after Black plays to #A, an environmental 
play would simply return to a position equivalent to one after an initial environmental 
play from the root, and yield no thermographic information. (Note that, because of 
the intervening environmental plays, it would play to a position equivalent to that 
after 3 alternating environmental plays from the root.) 

Working bottom up, we first evaluate #A. It has a value only if it is terminal, so we 
assume that it is. Since White has no play, we might evaluate it the same as the com- 
binatorial game { 5 | } = 6. Different go rules evaluate terminal ko positions differ- 
ently. The value of 6 is consistent with territorial counting under American Go Asso- 
ciation rules. Having no useful play, not even a dame , White would have to fill in a 
point of his own territory, sacrifice a stone, or pass and surrender a stone; each choice 
would cost 1 point. In this paper I shall use AGA territorial rules. 

The scaffolds for B have the equations 



v = 6 


(1) 


v = -4 + 2 1 . 


(2) 


They intersect at temperature 5 to produce the mast 

v = 1 + t . 


(3) 



Since White plays from B, we use the Right wall of the thermograph for deriving 
further thermographs. Figure 8 shows the graphical derivation of the Right wall of B. 
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Note that B would have a quite different thermograph if it were the root. That is 
typically the case for subsidiary ko nodes in the and/or tree. 

2 3 1 




Fig. 8. The Right wall of B 



As a ko-banned node, #B does not form a mast. The Right wall of B is its Left 
wall. 

We are now ready to form the thermograph for A. Its Right scaffold is the Left 
wall of #B and its Left scaffold is the line 

v = 5 — t. (4) 




Fig. 9. Simple ko thermograph 



Figure 9 shows the derivation of the thermograph. The ko has a temperature of 3 
and a mast value of 2. (The mast value and mean value of a ko do not necessarily 
coincide.) 

The value of #A made no difference. That is typical, as it must be at least as good 
for Black as A. White’s environmental play from B is senseless. 
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2.3 Pruning Redundant Branches 

To eliminate redundancies in the and/or game tree, we disallow sequences in which 
an environmental play follows the root node or an environmental play returns to an 
established mast. 

An environmental play establishes a mast for all previous nodes which are not ko- 
banned. The play sequence of nodes A...e...A e is equivalent to A e e if the number of 
local Black and White plays are the same between the two occurrences of A. (It is 
possible to return to the same board position after an unequal number of Black and 
White plays. But then the net number of captured stones differs, as well. Such posi- 
tions should not be considered the same node in the game graph.) 

An environmental play cannot establish a mast for a ko-banned node, because it 
destroys the ko ban. A ko-banned node may have a mast only if it is terminal. Its ter- 
minal value depends on the rules. 

In the Prolog program the predicate mast checks whether the mast for a node has 
been established: 



mast ( [Root] , Root) !. 
mast ( [e | History], Node) 
mast ( [_ j History], Node) 

mastl ( [Node | _] , Node) 
mastl ( [e | History] , Node) 
mastl ( [_ j History] , Node) 

mast2 ( [Node | _] , Node) 
mast2 ( [# Node | _] , Node) 
mast2 ( [_ | History] , Node) 



! , mastl (History, Node), 
mast (History, Node). 



! , mast2 (History, Node), 
mastl (History, Node). 



_ i 

mast2 (History, Node). 



History is the ancestor list of Node in reverse order. Searching back, if mast finds 
no play in the environment (‘e’), a mast is established only for the root. If it finds one 
‘e\ a mast is established for any previous occurrence of Node which is not ko- 
banned. If it finds a second ‘e’, a mast is established for any previous occurrence of 
Node. 



2.4 Double Ko Life 

In Figure 10 Black is alive in double ko. Let us call this position J. White can play 
at ‘a’ to position D, threatening to take Black’s stones. But Black can then play at ‘b’ 
to position K, and White is banned from taking back. Figure 1 1 shows the double ko 
graph and tree. 
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Fig. 10. Double ko life 




Fig. 11. Double ko graph and tree 



To derive the Ihermograph, let us start in the bottom right of the tree. First we 
evaluate the ko-banned node, #J, as terminal. { 8 | } = 9 in combinatorial game theory, 
and both AGA and Japanese rules agree. We now find the thermograph for K. The 
Right scaffold is 

v = 9 . (1) 

The Left scaffold is 



( 2 ) 
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Figure 12 shows the derivation. 




Fig. 12. Thermograph of K 



Since the Left scaffold lies to the Right of the Right scaffold, this is a terminal po- 
sition. As with #J, the terminal value is 9. The mast of K is the Right wall of #K. 

From D Black can play to #K or #J. Each one has a value of 9. The Left scaffold 
of D is equation (1). The Right scaffold is 

v = -21 + 2t . (3) 

In Figure 13 we find the Right wall of D. 




Fig. 13. Right wall of D 



The mast of D is 



v = -6 + t . (4) 

The thermograph of #D has no mast. Since Black has the move, it is the maximum 
of the Right walls of D and #K. The tree below this #K node is the same as below the 
other #K node, so the thermograph is the same: the vertical line at 9. 
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In Figure 14 we find the thermograph of #D. 




Fig. 14. Thermograph of #D 

Below temperature 15 line (1) dominates; above that line (4) does. 

Now we can find the thermograph of J. Its Right scaffold is the thermograph of 
#D. Its Left scaffold is line (2). Figure 15 shows the derivation. 




Fig. 15. Thermograph of double ko life 

Again the Left scaffold is to the right of the Right scaffold. J is terminal, with a mast 
value of 9. 



2.5 Molasses Ko 

Molasses ko (Figure 16) is a curiosity, reported only once in go history [3], If nei- 
ther player has a sufficiently large ko threat, a molasses ko may slow the game down, 
causing it to drag on and on. If White has the move in position A, he plays at Wl, 
Black takes the ko at B2, White takes two stones at W3, and Black takes the ko at B4, 
reaching position F, and then White plays elsewhere. Then Black continues the ko in 
similar fashion, and four moves later returns to A . The player who starts the molasses 
ko by threatening two stones is the one who must find a ko threat to win it. If neither 
player can win it, the game slows down, with one play elsewhere for every four plays 
in the ko. 
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A 



F 



Fig. 16. Molasses ko 



What happens when neither player wishes to play elsewhere? That depends on the 
rules. If the first player to pass is at a disadvantage, the molasses ko is truly intermin- 
able. Both players fill in territory until that results in the sacrifice of a group. But 
under AGA rules both players can safely pass, and the molasses ko remains on the 
board as a seki, worth 0 points of territory. 



I A 

/^\ 

60 C B 




Fig. 17. Molasses ko game graph 



In analyzing this ko, it helps to realize that we play through B, C, G and H (Figure 
17). From A it takes Black two moves to capture White for 60 points. From C it takes 
only one move for 61 points. If White does not continue to D, he should not have 
played to B in the first place. A similar argument applies to F and G. By not finding 
masts for those nodes we can save ourselves some trouble. Then we get the game tree 
in Figure 18. 









246 



William L. Spight 




Fig. 18. Molasses ko game tree 



This tree yields the thermograph in Figure 19. 




Fig. 19. Molasses ko thermograph 



The temperature of the molasses ko is 24 and its mast value is 12. The Right wall 
shows that when t < 12 and White plays first, Black will reply and set the molasses ko 
in motion. When t > 12 Black will respond in the environment. 



3 Comparison with Generalized Ko Thermography 

Players frequently contest a ko by making threats which the opponent must answer, 
and then taking the ko back. With correct play, the player who has enough sufficient- 
ly large threats can win the ko. 
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Berlekamp [1] utilizes the concept of komaster to extend thermography to ko posi- 
tions. The komaster of a ko may win it, but, once she has taken the ko to repeat the 
global position, she must make another local move immediately. (Berlekamp’s econ- 
omist rules permit two consecutive plays by the same player.) 

The komaster rule retains desirable properties of classical thermography: 

• For a sum of games which does not include multiple kos which are active at the 
same time on the same board, the mast value of the sum is the sum of the mast val- 
ues. 

• The temperature of the sum is at most the temperature of its hottest summand. 

This extension retains neither of these properties. However, by making ko threats 

explicit, it allows the exploration of positions which lie between the extremes of ko- 
master, and of intricacies of the relationships among threats and kos. 



3.1 Modeling Komaster 

Berlekamp [1] discovered that some ko positions are hyperactive; the mast value de- 
pends on who is komaster. Whether a position is hyperactive is important informa- 
tion. It is desirable that this extension model komaster conditions. 

However, with multiple kos active at the same time, it is impossible for one player 
to be the komaster for all of them. If she takes one, her opponent is free to take an- 
other. We can model only a weaker version of komaster. 

For any given ko-banned position, the komaster is allowed to break the ko ban 
only once, after which she loses komaster status. She pays no price for this privilege, 
although it is usually correct to continue locally if the opponent responds in the envi- 
ronment. 

Since this method makes all ko threats explicit, we use a dummy threat for each 
position to make a player komaster. We assume that the opponent must answer the 
threat, and that it evaluates to 0 in a position with no ko. 

After the komaster uses the dummy threat, it is impossible to return to any previ- 
ous node, since each one contained the threat. Thus when the komaster takes the ko 
back, her opponent may play in the environment. 



4 Summary 

By determining thermographs through reference to minmax play in a universal en- 
riched environment, we may extend thermography to positions with multiple kos. 
From the game graphs of ko positions we derive a pair of and/or trees and apply 
thermography to them. 

While this extension sacrifices some valuable properties of classical thermography, 
it allows us in theory to evaluate any go position which does not lead to a hung game, 
and to explore relations among kos and threats. 
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Appendix: Prolog Program for Game Tree Conversion 



/icicicicicicicicicicicicicicicicicicicicicicicic'kicicicicicicicicicicicicicicicicic'kicicicicic'kicicicicicicic 

* This program converts a combinatorial game graph * 

* to a pair of and/or trees * 



- op ( 6 0 0 , 


fx, #) . 


- op ( 2 0 0 , 


xfx, : : ) 


- op ( 7 0 0 , 


xfx, < : : 


- op ( 7 0 0 , 


xfx, : : > 



* graph_tree (Node, Graph, LeftTree, RightTree) * 

* converts a game graph. Graph, to a thermographic tree * 

* with Node as the root. The tree consists of two left- * 

* right (and/or) trees, LeftTree and RightTree, which * 

* indicate alternating play. * 



graph_tree (Node, Graph, LeftTree, RightTree) 

graph_tree (left. Node, Node, [] , Graph, LeftTree), 
graph_tree (right , Node, Node, [] , Graph, RightTree). 

% The empty list ( [] ) is the history (ancestor list) of 
% Node . 

graph_tree (_, Node, _, _, _, Node) number (Node) , !. 

% If Node is a number, it is the Tree. 
graph_tree (Direction, Node, RootO, History, Graph, Tree) 

! , followers (Direction, Node, Graph, Followers), 

% Find the immediate Followers in Direction from Node 
% in Graph. 

kobanned (History, RootO, Root, Followers, Children), 

% The provisional Root (RootO) and Followers map to 
%the Root and its Children. 
other_way (Direction, OppositeDir) , 

branches (OppositeDir, Node, [Root | History], Graph, 
Children, Branches) , 

% Each Child is the root of a sub-tree. Branch, 
tree (Direction, Root, Branches, Tree). 

% Tree is derived from the Root, the Direction, and the 
% Branches . 



* followers (Direction, Node, Graph, Followers) finds * 

* the immediate Followers in Direction from Node in * 

* Graph. Graph is a list of Nodes and Followers of the * 

* form Node - Lef tFollowers : : RightFollowers . * 

•k'k-k'k-k'k-k'k-k-k-k'k'k-k-k'k-k-k-k-k-k-k-k'k-k-k-k'k'k-k'k-k'k-k-k-k'k-k'k'k-k-k-k-k'k-k-k-k-k-k'k-k'k-k-k'k J 



followers (_, _, [] , [] ) !. 

followers (left. Node, [Node - Lef tFollowers :: _ | _] , 
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Lef tFollowers ) !. 

followers (right , Node, [Node - _ : : RightFollowers | _] , 
RightFollowers ) !. 

followers (Direction, Node, [_ | Nodes], Followers) 
followers (Direction, Node, Nodes, Followers). 

* kobanned (History, RootO, Root, Followers, Children) * 

* checks History to select Children which are not ko- * 

* banned from among Followers. In case of a ko ban * 

* Root = # RootO. Any ko ban depends on the ko rules * 

* used. This version uses the positional super ko rule, * 

* which prohibits any repetition of a whole board * 

* position. * 



kobanned(_, e, e. Followers, Followers) !. 

% An environmental play (e) removes any ko ban. 
kobanned(_. Root, Root, [] , [] ) !. 

kobanned (_, [] , [] ) !. 

kobanned (History, RootO, Root, [Follower | Followers], 
ChildrenO) repeated (History , Follower, YesNo) , 

map(YesNo, RootO, Root, Follower, ChildrenO, Children), 
kobanned (History, RootO, Root, Followers, Children). 

I'k-k'k'k-k-k'k-k-k-k'k-k-k-k-k-k'k-k-k-k-k-k'k-k-k-k'k'k-k'k-k-k-k-k'k'k-k-k'k-k-k-k'k'k-k-k-k-k-k-k-k'k-k'k'k-k 

* repeated (History , Follower) searches for Follower in * 

* History, which is in reverse order. If Follower * 

* occurs before an environmental play (e) is encoun- * 

* tered, there is a ko ban. * 



repeated([], _, no) !. 
repeated ( [e | History], Follower, YesNo) 
repeatedl (History, Follower, YesNo) . 
repeated (_, Follower , no) number (Follower) 
repeated ( [Follower | _] , Follower, yes) ! 

repeated ([# Follower | _] , Follower, yes) 
repeated ( [_ | History], Follower, YesNo) 
repeated (History, Follower, YesNo). 



jit'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'kit'kit'kit'k'k'kit'kii'kit'k'k'k'k'kit'k'k'k'k'k'k'kit'kit'k'k'kit'k'k'k 

* repeatedl (History, Follower, YesNo) * 

* ascertains the node at which an environmental play * 

* (e) was made. If it was Follower, then there is a ko * 

* ban; otherwise there is not. * 



repeatedl ( [e, # Follower | _] , Follower, yes) 
repeatedl ( [# Follower | _] , Follower, yes) 
repeatedl ( [Follower | _] , Follower, yes) ! 

repeatedl (_, _, no) : - ! . 
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map (no, Follower, [Follower | Children], Children) 

: - ! 

map (yes. Root, # Root, _, Children, Children) !. 

% If there is a ko ban, the Root is marked and 
% the Follower is not a Child. 

other_way (left, right) !. 

other_way (right , left) !. 

/icicicicicicicieiciciciciciciciciciticicicicic'kiciticicicicic'kicicicicicicic'kiciciciciciciciticicic'kicicic 

* branches (Direction, Node, History, Graph, Children, 

* Branches) . There is a Branch for each Child. In 

* addition, there is one for a play in the environment 

* (e) if the mast for Node has not been established. 

* On an established mast an environmental play is 

* redundant . 

branches (_, Node, History, _, [] , [] ) 

mast (History, Node) , ! . 

branches (Direction, Node, History, Graph, [] , [Branch]) 

: - ! , % No mast established. Play in environment. 

graph_tree (Direction, Node, e. History, Graph, Branch) 
branches (Direction, Node, History, Graph, [Child | 
Children] , [Branch | Branches] ) : - 

graph_tree (Direction, Child, Child, History, Graph, 
Branch) , 

branches (Direction, Node, History, Graph, Children, 
Branches) . 



* mast (History, Node) determines if the current Node is 

* on an established mast. A mast is established if 

* 1) Node is the Root, or 

* 2) there has been one environmental play (e) 

* between Node and an earlier occurrence, or 

* 3) there have been two environmental plays between 

* Node and an earlier occurrence of # Node (ko- 

* banned) . 






mast ( [Root] , Root) !. 

mast ( [e | History], Node) !, mastl (History, Node), 

mast ( [_ j History], Node) mast (History, Node). 

mastl ( [Node | _] , Node) !. 

mastl ( [e | History], Node) !, mast2 (History, Node), 

mastl ( [_ j History], Node) mastl (History, Node). 

mast2 ( [Node | _] , Node) !. 
mast2 ( [# Node | _] , Node) !. 

mast2 ( [_ | History], Node) mast2 (History, Node). 
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tree (_, Root, [] , Root) !. 

tree (_, Root, [e] , Root) !. 

tree (left. Root, Branches, Root <:: Branches) 

tree (right. Root, Branches, Root ::> Branches) 
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Abstract. The field of Computer Go has seen impressive progress over 
the last decade. However, its future prospects are unclear. This paper 
suggests that the obstacles to progress posed by the current structure of 
the community are at least as serious as the purely technical challenges. 
To overcome these obstacles, I develop three possible scenarios, which 
are based on approaches used in computer chess, for building the next 
generation of Go programs. 



1 A Go Programmer’s Dream 

In January 1998, I challenged the readers of the computer-go mailing list 12 HI to 
discuss future directions for Computer Go: 

Assume you have unlimited manpower at your hand (all are 7-Dan in 
both Go and programming) and access to the fanciest state of the art 
computers. Your task is to make the strongest possible Go program 
within say three years. What would you do? 

Many of the answers I received severely criticized all currently used ap- 
proaches, and advocated the development of revolutionary new techniques. I 
do not think such wholesale criticism is justified. In this paper I take a close 
look at the current state of Computer Go, and propose a more systematic use 
of already available, proven techniques. I claim that this will already lead to 
substantial progress in the state of the art. Writing programs for Go has turned 
out to be much more complex than for other games. The way Go programs are 
developed must adapt accordingly: it is necessary to scale up to larger team 
efforts. 

The paper is organized as follows: Section 2 analyzes the state of Computer 
Go, identifies some of the strengths and weaknesses of the current generation 
of programs, and outlines a plan which draws on existing technology but still 
promises substantial progress within a few years. In section 3, I introduce three 
development models that have been used in chess, and discuss how to adapt 
them to Computer Go. Finally, section 4 introduces promising topics for long- 
term research. 



H.J. van den Herik, H. Iida (Eds.): CG’98, LNCS 1558, pp. 252 l^(>H 1999. 
( c ) Springer- Verlag Berlin Heidelberg 1999 
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2 The State of Computer Go 

To the casual observer, Computer Go may seem to be in fine shape. However, 
a number of problems threaten the future prosperity of the field. Many of these 
problems are rooted in the current structure of the Computer Go community. 

2.1 The Computer Go Community 

Recent years have seen many developments in Computer Go. Good progress has 
been made in the tournament scene and the internationalization of the field. 
However, in several ways the field has remained immature: programs are con- 
structed on an ad hoc basis, results are held back for commercial reasons, and the 
lack of support for new researchers willing to enter the field is a severe problem. 



The Tournament Scene Several annual tournaments have been established. 
Two world championships, the Ing foundation’s International Computer Go 
Congress and the newer FOST cup, continue to attract the elite of Go pro- 
grams from all over the world. The annual European and the North American 
Go congresses host smaller, local computer championships. The internet-based 
Computer Go Ladder m allows Go programmers from all over the world to 
compete with a wide variety of opponents. 



An International Activity While human Go players are still concentrated 
mainly in Asia, Computer Go has become a truly international activity, with 
serious programs being developed in at least a dozen countries. It is not unusual 
to see the first five places in a tournament taken by competitors from many dif- 
ferent countries. Several strong new commercial programs have been developed, 
and the total number of programs to participate in tournaments easily exceeds 
a hundred. 

There seems to be renewed interest from the general AI and the computer 
games community. After the world-championship performance of programs in 
games such as chess, checkers or Othello, many eyes have turned to Go as the 
‘final frontier’ of computer-game research. 



Lack of Support for New Researchers Only a few individuals or institu- 
tions have sufficient resources to subscribe to a full-scale Go programming effort. 
Indeed most new Go programmers have to start almost from scratch. Because 
of the overhead in getting started, it is very hard for a smaller project, such as 
a masters thesis, to make a significant contribution. 

Given the complexity of the task, the supporting infrastructure for writing 
Go programs should offer more than the analogous infrastructure did for other 
games such as chess. However, the Go infrastructure is far inferior. The play- 
ing level of publicly available source code m is far below that of state of the 
art programs. Quality publications are scarce and hard to track down. A few 
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of the top programmers have an interest in publishing their methods. Whereas 
contributions on computer chess or general game-tree search regularly appear in 
mainstream AI journals, technical publications on Computer Go remain confined 
to the proceedings of specialized conferences. The most interesting developments 
can only be learned by direct communication with the programmers, unfortu- 
nately, they are never published. 



2.2 State of Go Programs 

Computer Go constitutes a formidable technical challenge. Existing programs 
suggest the following difficulties: 

— A competitive program needs 5-10 person-years of development. 

— A typical program consists of 50-100 modules. 

— It is the weakest of all these components which determine the overall per- 
formance. 

— The best programs usually play good, master level moves, but their perfor- 
mance level over a full game is much lower because of the remaining blunders. 

— A number of standard techniques have emerged. However, there is no single 
program which incorporates most of the currently existing successful Com- 
puter Go techniques. 

Let me discuss the strengths and weaknesses of the programs in some detail. 



Special Strengths of Current Programs Several of the leading programs 
do one specific task better than the others. For example, Handtalk excels in 
overall integration, playing good shape, and in knowledge about group attack 
and defense. Go f++, on the other hand, is most efficient in taking territory, 
plays the fewest unnecessary ‘wasted’ moves per game, and has an extensive 
special purpose joseki book for opening move sequences. Further, Go Intellect is 
a mature program with strong tactical fighting and overall Go knowledge, while 
GoTools is a high dan-level Life and Death solver specialized for completely 
surrounded areas. 



Incompleteness of Existing Programs Unfortunately, there is no one pro- 
gram that incorporates most of the currently existing successful Computer Go 
techniques. A next-generation program would need to recreate and integrate 
most of these individual capabilities. 

It is easy to see why single-person projects are inadequate: the sheer number 
of necessary components. Even assuming only one month for each module, a 
reasonably complete program will take four to five years to build, even without 
considering testing and system integration. 
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Disappointing Sustained Performance The difference in first-play versus 
sustained long-run performance of programs against human players is drastic: 
programs typically do well in their first game against an opponent inexperienced 
in playing computers. For example, world champion Handtalk has won Ing’s 11 
stone challenge matches against high dan-level human players, and has beaten a 
1-dan player without handicap in an exhibition game at the FOST cup. However, 
if allowed a few practice games, humans soon spot and exploit a program’s 
weaknesses. The same program that once beat the 1-dan regularly loses against 
a well-prepared 5-kyu player, even when receiving huge handicaps of up to 20 
stones. 



2.3 What To Blame? The Model or Its Implementations? 

What is the reason behind the irregular performance of programs? How can Go 
programs look so good on one day and so pathetic on the next? One theory is that 
there is a fundamental problem in the underlying models. In this view, current 
Go programs are not able to capture the true spirit of Go: they may play good- 
looking moves, but do so without any ‘real understanding’ of the game, which 
inevitably shows sooner or later. The alternative view is that the current model 
is basically sound and sufficient, but programs suffer from incomplete or buggy 
implementations . 



2.4 A Model of Go Program Components 

Fotland’s ‘Computer Go design issues’ lists about sixty components of current 
Go programs 0, and can be considered as defining a ‘standard model’. State of 
the art programs contain many of the modules described in Fotland’s list. I will 
use the following simplified classification of development tasks for a Go program: 

— Mathematical foundations and Go theory 

— Knowledge representation and data structures 

— Search methods 

— Global move decision 

— Software engineering and testing 

— Automatic tuning and machine learning 

I will briefly discuss the issues for each group of tasks, describe the current 
state of their implementation, and point out promising areas for short-term 
research and development. A program implementing this ‘standard model’ as 
completely and technically accurate as possible would serve as an interesting 
milestone and allow a more meaningful analysis of its strengths and weaknesses 
than is currently possible. 



Mathematical Foundations and Go Theory Theoretical techniques appli- 
cable to Computer Go range from abstract mathematics for group safety and 
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endgame calculation 0 H| to Go-specific knowledge such as the semeai formula. 
d gives a detailed discussion. 

Standard game tree-searching methods are well established for goal-oriented 
tactical search in Go. In addition, new search methods such as proof-number 
search |L have been successfully applied in at least one commercial program. 
The many potential benefits offered by theory have only partially been applied 
in current programs. 



Knowledge Representation and Data Structures Most programs use a 
hierarchical model for board representation. Low-level concepts are blocks of 
adjacent stones and connections or ‘links’ between stones. Chains, groups and 
territories are higher-level concepts built from the primitives. Pattern matching 
is used to find candidate moves. Knowledge representation has been the focus of 
the majority of Computer Go research to date, and has reached a sophisticated 
level. 

I expect that the quality of knowledge incorporated in programs will gradu- 
ally be refined. The quantity of knowledge is rising dramatically due to large scale 
pattern learning methods, which are becoming increasingly popular moils 
However, it is unclear how computer-generated pattern databases can reach a 
quality comparable to human-generated ones. For comparison, it would be fasci- 
nating to develop a large corpus of human Go knowledge, to try to identify and 
encode the pattern knowledge of Go experts. 



Search Methods Three types of search are commonly used in Computer Go: 
single-goal, multiple-goal, and full-board search. 

Specialized searches that focus on achieving a tactical goal are some of the 
most important components of current Go programs. A major advantage of goal- 
directed search over full-board search is that evaluation consists only of a simple 
test, which is much faster than full territory evaluation. One use of goal-directed 
search is to propose locally interesting moves to a selective global move decision 
process. I expect the use of goal-directed minimax searches to expand widely in 
volume and scope. 

Single-goal search uses standard game tree searching techniques for finding 
the tactical status of blocks, chains, groups, territories, or connections. Knowing 
this status improves the board representation and is a precondition for creating 
a meaningful scoring function. 

Examples for the targets of single-goal search are given in the table below. 
Current programs implement many but not all of these goal-oriented searches. A 
complete implementation of all basic single-goal searches seems to be a straight- 
forward development task. 
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Target 


Reference 


Single block capture 


m 


Life and death 


Eoinr 


Connect or cut 


0 


Eye status 


- 


Local score 


(Goliath) 


Safety of territory 


m 


Semeai 


0 



Multiple-goal search tries to achieve a combination of basic goals, such as 
capturing at least one of a set of blocks. The implementation of such tasks is 
rudimentary in most programs. Simple double threats such as double atari are 
usually built in as special cases. However, many standard Go strategies can 
be understood as more general double threats [E|. The following table lists a 
few common themes. In ;}j] a more sophisticated architecture for multipurpose 
strategic planning is described. 



Target (s) 


Goal(s) 


Multiple blocks 

Territory boundary 

2 groups 

Group 

Group 

Group 


save all blocks 
capture or break through 
splitting and leaning attacks 
attack and make territory 
live locally or break out 
make eyes or win semeai 



Searching each possible goal combination leads to combinatorial explosion in 
the number of searches. Heuristics can be used to select promising goal combi- 
nations for search. I expect a lot of progress on this problem over the next few 
years. 

Full board search seeks to maximize the overall evaluation. Because of the 
complex evaluation and high branching factor of Go, full-board search has to 
be highly selective and shallow. I expect the use of full board evaluation to 
increase steadily along with improving hardware, but without playing the same 
dominating role it has played for other games. 



Global Move Decision There is a great variety of approaches to the problem 
of global move decision in Go. No single paradigm, comparable to the full-board 
minimax search used in most other games, has emerged. Most programs use a 
combination of the following methods: 

— Static evaluation to select a small number of promising moves 

— Selective search to decide between candidate moves 

— Shortcuts to play some ‘urgent’ moves immediately 
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— Recognition and following of temporary goals 

— Choice of aggressive or defensive play based on a score estimate 

I expect experimentation to continue, without any clear preference or ‘standard 
model’ emerging. Methods based on combinatorial game theory (section 4.1) 
have the potential to replace more traditional decision procedures. 



Software Engineering and Testing A competitive Go program is a major 
software development project. Software quality can be improved by using stan- 
dard development and testing techniques unj. A wide variety of game-specific 
testing methods are available, including test suites, auto-play and internet-based 
play against human opponents. 

It is hard to judge objectively, but I suspect there is much room for improve- 
ment in this area. Most leading programs have been in continuous development 
for ten or more years. Many of these programs may be reaching a level of inter- 
nal complexity where it is difficult to make much progress. Originally designed 
for machines a thousand times smaller and slower, programs have grown layer 
upon layer of additions, patches and adjustments. Some programs have been 
rewritten from scratch in the meantime, but this is a daunting and extremely 
time-consuming task 1221 - 



Automatic Tuning and Machine Learning Machine learning techniques 
are currently used in only a few programs EH ED However, parameter tuning 
and book learning techniques seem to be in more frequent use. I predict that 
the applications of machine learning techniques in Computer Go will increase, 
for example for fine-tuning the performance of complex programs with many 
components. 



3 A Research Plan for Computer Go 



What is the real limit on the performance of Go programs imposed by current 
models? Should research focus on developing new models or on improving the 
implementation of current ones? To answer these important questions, I propose 
the following three lines of research and development: 

— Detailed analysis of current programs’ errors 

— A Dreihirn experiment 

— Large scale Go programming projects 

The first two methods are designed to better understand the problems of 
current programs. The third proposal addresses testing the limit of current tech- 
nology. 
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3.1 Detailed Error Analysis 

Detailed error analysis of current programs can draw upon a wealth of available 
game records m Many classifications of mistakes are possible. For our purposes, 
it may be sufficient to assign errors to one of two broad groups: lack of basic 
understanding and lack of efficiency. Lack of basic understanding can be defined 
as the failure to identify the current focus of a game. Examples are attacking or 
defending the wrong group, ignoring threats or double threats, or making wrong 
life and death judgments. 

Efficiency errors are less drastic individually but have a large cumulative 
effect. Mistakes belonging to this category are: making overconcentrated shapes, 
taking gote when a sente move is available, or achieving the correct main goal 
without considering secondary effects. An example of the latter kind of problem 
is saving a group by connecting it on a neutral point, instead of living more 
profitably by surrounding enough territory to make two eyes. Another example 
of efficient play is kikashi: playing profitable forcing moves before going back to 
make a neccessary but unattractive defense. 

Research along these lines will aim to develop automatic methods for per- 
forming such analyses, by using statistical techniques and developing suitable 
test position collections. 



3.2 A Dreihirn Match for Go 

In the Dreihirn chess games 0 , a team of two chess computers supervised by a 
human ‘boss’ has achieved strong results against chess grandmasters. The team 
played markedly better than each individual program, even though the ‘boss’ 
was a relatively much weaker player. The human supervisor was able to select a 
promising overall direction of play and avoid some typical computer missteps. 

I propose a comparable experiment in Go, with a team of several Go pro- 
grams supervised by a strong human player. At each move, the human selects 
one of the moves proposed by the programs. This team is tested against a variety 
of opponents, including other programs and humans of different strengths. Such 
a test can serve to establish an upper limit of current program performance, and 
show whether the uneven play is due more to individual bugs in the implementa- 
tions or due to more fundamental limitations of all current programs. If a series 
of games is played, the test would also show if human opponents can adapt as 
quickly to such a system as they adapt to each individual program’s weaknesses. 



3.3 Outline of an Architecture for Large Scale Go Projects 

From the beginning, most Computer Go projects have consisted of a single pro- 
grammer with occasional assistance from either scientists or Go experts. In recent 
years, a few commercial programs have been developed on a slightly bigger scale, 
with small teams of programmers and managers working on the Go engine and 
user interface. 
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I believe that the scale of these projects is not large enough, and that projects 
an order of magnitude larger are necessary to produce a qualitative jump in per- 
formance. Section 2 has identified a long list of tasks required to implement 
a complete Go program based on the current ‘standard model’. However, im- 
plementing a successful large scale Go project requires a series of preliminary 
steps. 

— Secure an existing state of the art program to build on, including an easy to 
use basic Go toolkit. 

— Modify the program to increase its usability in a multi-programmer environ- 
ment. 

— Describe the model underlying the program in detail. 

— Document and structure the source code extensively. 

— Define an effective communication method between team members. 

— Implement a well-defined process for subtask assignment, code integration 
and testing. 

3.4 Three Proposals for Large Scale Go Projects 

In chess, three approaches have been taken in recent years that may serve as an 
inspiration for Go: 

— Large company funded teams (Deep Blue) 

— Public domain source code (Gnu Chess, Crafty) 

— University projects (many) 

Plan 1: Large Scale Commercial Project The Deep Blue chess project 
represents a large-scale effort, one order of magnitude larger than typical com- 
petitive chess programs. Its success rests on two pillars: on the technical side, it 
is a complete, mature system, the result of sound engineering firmly based on 
a large amount of previous research. On the organizational and financial side, 
the Deep Blue project was backed by a large company with an interesting new 
marketing strategy. Computer chess was chosen as an advertising vehicle because 
it represents an attractive topic that is tied to deep myths about human and 
machine intelligence. 

Would a similar alliance of research and big business make sense in Go? Who 
would be a potential sponsor, and what would be their interest? In my view it 
would be a world-class company with a strong interest in the Asian market, 
and an ambition to create or reinforce their image as an intellectual leader. The 
company would profit mainly from the publicity generated by exhibition games, 
not from sales of Go software. Given the high regard for Go as an intellectual 
sport, it seems possible to attract a level of attention comparable to that of the 
chess matches, at least in East Asia. In Go, what is an achievable goal that will 
fascinate the masses? World championship level play still seems far in the future. 
Yet a program playing at a sustained 1-dan level, which can beat professional 
players on 9 stones handicap, will be perceived as an intellectual achievement at 
least equal to that of the chess machines. Is it possible in the near future? Let 
us try! 
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Plan 2: Public Domain Go Project Source code for more than a dozen chess 
programs is readily available on the internet m The two best-known of these 
programs, Gnu Chess and Crafty, have active user groups which are testing, 
discussing or directly improving the program. 

In Go, several public domain projects have been attempted over the years. 
So far, none of these has resulted in a tournament level program. Recently, there 
seems to be renewed interest in such a project, which has generated a large 
amount of messages on the mailing list. 

The characteristics of a public domain Go program are quite different from 
a funded project and include the following items. 

— Greater fluctuation of team members. 

— Less individual commitment, lower work intensity. 

— Low development cost. 

— Difficult moderation and integration tasks. 

The project goal could be to develop a noncommercial, research-oriented 
tool. The program structure should allow small or medium-scale experiments, 
for example in machine learning, to profit from a state of the art Go engine. 
A less ambitious approach would aim at developing only a library of commonly 
used functions. 

Plan 3: University Research Project Many of the strongest chess programs 
are developed at universities. The situation in Go is comparable: about half of 
the current top 20 Go programs have started as student projects. An advantage 
of student projects is that relatively little funds are required, and students can 
combine their parts of the overall programming work with their research. 

The main challenge of this approach is to assemble a large group of talented 
students and keep their efforts coordinated over a number of years. Given the 
current distribution of Go players, a large-scale university Go project would 
probably be feasible only in an Asian country. 

4 Some Issues for Long-Term Research in Computer Go 

Compared to the complex reasoning processes of human Go experts, the models 
incorporated in current Go programs are severely limited. A goal of long-term 
research could be to close this gap, either by building more sophisticated models 
or by deriving human-like reasoning capabilities from simple models. 

Another fascinating topic is modeling the high level full-board plans of human 
players, or advanced Go concepts such as aji, korikatachi or sabaki. Therefore, 
one direction for research is evaluation from first principles, using only search 
and learning, without relying on human-engineered heuristics. 

Long-term machine learning topics are automatic derivation of sophisticated 
Go concepts from first principles, or the learning of patterns along with suitable 
contexts for their application. 

Yet another research topic, addressed in the next subsection, is the applica- 
tion of combinatorial game theory to Computer Go. 
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4.1 Combinatorial Game Theory for Computer Go 

As a framework for Computer Go, combinatorial game theory has several ad- 
vantages compared to the standard minimax game-playing model. However, the 
finer points of this theory are as good as unknown ouside the small combinatorial 
games community. Several of the tools provided by this theory are well suited 
for analyzing Go and should be used more in Go programs. 

For example, the method called thermography is able to model fundamental 
Go concepts such as sente and gote very naturally [ 3 | . Thermography computes 
the temperature of each local situation, which is a measure of move urgency. 
Comparing the thermographs before and after a move yields an optimal tem- 
perature range for each move. These ranges may differ dramatically for both 
players, for example in the case of one-sided sente moves. Using such an anal- 
ysis, programs will be able to follow the standard Go strategy of keeping sente 
moves in reserve as potential ko threats. The theory is able to determine precisely 
how long the ambient temperature of a game remains high enough to prevent 
an opponent’s reverse sente move. 

Another important concept from combinatorial game theory is reversibility, 
which allows a player to make many moves based on a local consistency argu- 
ment, without any full-board analysis. The computational advantages of this idea 
are immediate. Thermography introduces the stronger notion of thermographic 
reversibility by which a further reduction of search can be achieved |3J . 

Recent research by Kao addresses handling incomplete local game trees and 
selective search strategies within a combinatorial game framework m ■ An im- 
portant research question is to generalize the precise concepts of combinatorial 
game thoery to work in a heuristic setting, in analogy to the heuristic game tree 
search based on minimax used in other games. 



4.2 Handling of Ko Fights 

Ko fights are considered the most complex phase of the game, and are handled 
poorly by current programs. Progress in theory and in practical algorithms for 
thermography P ITT^ provides effective and sound methods for comparing the 
relative values of ko and non-fco moves. This framework also allows the evaluation 
of possible ko threats. 



5 Summary 

Computer Go has enjoyed a boom in recent years, but its progress is hampered 
by problems in the structure of the Computer Go community. An analysis of the 
current state of Computer Go indicates promising directions for research, both 
short-term and long-term. To overcome the lack of critical human resources, 
Computer Go would benefit from the same kind of larger scale projects that 
have succeeded in chess. 
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Abstract. This paper describes a new method for estimating the 
strength of a group in the game of Go. Although position evaluation 
is very important in Computer Go, the task is very complex, so good 
evaluation methods have not yet been developed. One of the major fac- 
tors in evaluation is the strength of groups. In general, this evaluation is 
a difficult problem, so it is desirable to have a simple method for making 
a rough estimate. We define PON (Possible Omission Number) as a pre- 
cise measure for the strength of groups, and present a simple method for 
the rough estimation of PON by calculating n-th dame (liberties). We 
report some experiments indicating the effectiveness of the method. 
Keywords: Go, Strength of groups, Possible omission number, PON, 
n-th dame 



1 Introduction 

Research in computer Go is becoming increasingly popular. Unlike games such 
as chess or Othello, no effective way of position evaluation has been developed. 
Many methods for selecting candidate moves in Go are not based on position 
evaluation. The reason is that the game board is large and the components of po- 
sition evaluation are not clear. We believe that the development of more accurate 
position evaluation is needed for the development of high-level playing programs 
and detailed position analysis. In 0, we proposed a method which estimates the 
value of candidate moves by directly evaluating the position variation caused by 
the moves. That method also needs accurate position evaluation. 

One of the difficult problems in evaluating Go positions is that one should 
evaluate both the global configuration of stones and local semeai (mutual at- 
tacks) in a balanced way. One of the most important components for both 
global and local evaluation is the strength of groups. In local semeai, the relative 
strength of groups fighting each other decides the winner. Arranging stones in a 
globally optimal configuration means overall balancing of group strengths, while 
also considering the strength of opponent’s groups. 

Group strength is determined by the size and shape of the group and the 
configuration of nearby friendly and opponent stones. To calculate the exact 

* We thank Dr. Martin Muller for extensively proofreading this paper, and for useful 
technical feedback. 
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strength in general, deep search or using a pattern database as in Q and gj 
is necessary. Exact evaluation is not always needed, however. In practical play, 
sometimes a rough estimation is enough or is forced because of shortage of 
time. Simple estimation methods are useful in such cases. For many playing 
programs, group strength evaluation is one of the most important components. 
Such programs typically evaluate strength empirically by the number of dame, 
the number of eyes, the shape of the group, etc. 

In many cases, however, the resulting value is not concrete or substantial, 
but arbitrarily devised and abstract. In this case, its rationale is not clear, so 
that its application needs experience and it is not suitable for analysis. Therefore 
we must find a value that is substantial and applicable to estimate the strength 
of groups. We consider the number of possible tenuki (or omission of making 
moves), which was originally introduced as X life in 0, to be such a value. In 
this paper we name the number possible omission number (PON) and will show 
that we can estimate this number from the number of n-th dame |o]. 

To state the conclusion briefly in advance, the estimation of group strength 
by this method cannot replace precise search methods for determination of life 
and death for tightly surrounded groups. However, it provides a low-cost and 
practical means of evaluating the strength of immature groups which are loosely 
surrounded in the opening or in the early middle game. 

In the next section, we define the possible omission number and how to 
enumerate ?r-th dame which are used to estimate this number. In Section 21 we 
experimentally find a method for approximately computing the possible omission 
number. We discuss the strengths and weaknesses of our method in Section 21 
and conclude this paper in Section Q 

2 Possible Omission Number and n-th Dame 

In the following description, we assume that it has already been determined by 
some other method which stones belong to the same group. For life and death of 
groups, we consider capturing the whole group as death, and living with at least 
some part of the group as life. We define the possible omission number (written 
as PON below) for a group as follows: 



Possible Omission Number (PON) 

Consider a group G of color C. 

(a) Group G is neutral (i.e. life and death depends on the next turn). 
PON of G is 0. 

(b) Group G is alive. If G becomes neutral after n opponent moves in a 
row but G is still alive after n — 1 consecutive opponent moves, PON of 
G is n. 

(c) Group G is dead. If G becomes neutral after C is allowed to make n 
consecutive moves but G is still dead after n — 1 consecutive moves, PON 
of G is — n. 
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Intuitively, PON means how often the player or the opponent can ignore a local 
play before the group status changes. 

Although PON cannot be regarded as the very measure of the strength of 
groups, it is clear that PON is closely related to the strength of groups. Popma 
and Allis made use of the number in the analysis of multi-step ko positions. 
The usage is for rather closely surrounded groups often seen in the end game. 
However, computing the number is effective not only in such positions but also 
in positions often seen in the opening or middle games, where two kinds of 
applications of PON can be considered: First, PON can be used as a criterion for 
making moves strengthening the player’s groups and weakening the opponent’s 
groups. Second, PON can be used to decide which group is the most urgent when 
there is more than one weak group. For example, in a battle between two groups 
of different colors, groups having similar PON are usually urgent and groups 
having very different PON are usually not urgent. 

It is difficult to calculate PON precisely in complicated positions, e.g. when 
groups are fighting closely. Exhaustive search has to be used, which is impossible 
when analysing under time pressure. An alternative is employing lots of known 
patterns. Human experts can decide their moves quickly due to their extensive 
pattern knowledge. But in a program, it requires a lot of resources to create a 
big pattern database and to search patterns. We think that a method that can 
estimate PON easily with pretty good accuracy is necessary. 

We will show such a method in the following. As a preparation, we define 
group, edge, n-th dame (or liberty) of a group, and kosuri (or rub). 

group a set of stones of the same color which can be regarded as at least loosely 
connected. 

edge the outermost intersections on a Go board. 

n-th dame and kosuri Span antennas step by step from the stones of the 
group to empty points along the lines of the board. Empty points which 
can be reached in n steps in a manner defined below are called n-th dame. 
During these steps, empty points are labeled by their kosuri number, which 
is defined as follows: If a point is adjacent to (an) opponent’s stone(s), count 
one kosuri for each opponent stone. Propagate the kosuri numbers along the 
lines on the Go board. Only points for which the kosuri number is at most 
one can be reached. 

There are two additional rules for kosuri: For points immediately under op- 
ponent’s stones on the second or the third lines, add 2 to the kosuri number. 
For points on the edge, add 1 to the kosuri number. 

The kosuri number has the following meaning: life and death of a group depends 
on how easily it can escape when it is surrounded by its opponent. If a path for 
escape touches an opponent’s stone, escape is hard. It is still possible to escape if 
the path touches once, but there is little possibility to escape if the path touches 
twice. The reason why points immediately under opponent stones on the second 
or the third line are regarded as having touched twice is that it is very hard to 
escape in such situations. 
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Fig. 1 . An example of n-th dame 



Fig. H shows an example of a black group. Dame up to n = 4 are denoted by 
the numbers. Black stones a, b, c, and d form one group, and the black stone 
e constitutes a different group. Consider the path leading straight upward from 
stone b. The first dame R7 is a point without kosuri. The second dame R8 is a 
point with one kosuri from the white stone g next to it. Points with one kosuri 
are denoted by an Italic-bold font. Since the kosuri number is accumulated, the 
third dame R9 and the fourth dame RIO are also points with one kosuri. Point Q9 
to the left of R9, becomes a point with kosuri number 2 and cannot be reached 
because it touches the white stone g. Consider another path going downward 
from stone d. The second dame PI is a point with one kosuri because it is on the 
edge of the board. 02 at the lower left of d, is a second dame, but it cannot be 
reached because it has kosuri number 2. It is immediately under the opponent’s 
stone j, which is on the third line. 

The reason for our definitions is the following: In general, groups with a lot 
of dame can make eyes easily. On the contrary, groups with small number of 
dame are hard to make alive. Among the n-th dame, dame closer to a group are 
more effective for making eyes. 
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We have defined three kinds of dame, i.e. edge points, points with one kosuri, 
and other points. The meaning of these kinds for PON is as follows: As described 
before, points with one kosuri are worse than those without kosuri because escape 
is more difficult. Points on the edge are preferable because making eyes there is 
easier. 

In the following section, we will experimentally find an estimate for PON 
from the number of dame. 

3 Designing an Evaluation Function 

We made some experiments to find a method for the simple estimation of PON. 
We will explain the test instances used and show the results of the experiments. 

3.1 Instances Applied in This Paper 

We identified 23 groups, which are not completely dead or alive, in the seven 
opening and midgame positions of a game in [6}. These instances are shown in 
the Appendix. The numbers of dame, of degree from 1 to 4 of each kind were 
counted, and the real PON for each instance was estimated by a human expert 
of amateur six dan rank. The real PON of the 23 instances vary from -2 to 5 
(see Fig. |3). 

3.2 Experiments 

Assume that we can describe the PON, N, by a function of the numbers of n-th 
dames. In this paper, we assume a simple approximation function 

4 

N - y W k d ik ) 

2=1 fc=e,t,u 

where wt, Wk 7 dik are the weights for the degree i, the weight for the kind k 
(w e is the weight for edge points, Wt is the weight for the points with one kosuri 
except edge points, and w u is the weight for other points), and the number of 
dame of degree i and kind fc, respectively. / is an appropriate function. 

Since we limit the degree of dame to 4, this approximation function has seven 
parameters, i.e. w±,W 2 , W 3 , W 4 , w e , Wt , and w u . We observe for each instance how 

4 

g(w!,W2, —,W U ) = ywj y Wkdik 

2=1 k=e, t,u 

changes as parameters change. It is the purpose of these experiments to observe 
how the value of g(w\, ...,w u ) (shown as S hereafter) corresponds to the real 
PON, and what the optimum values of the weights are. 

Several kinds of measures could be used for evaluating the approximation 
function. The most important condition is that S correlates with PON. Therefore 
the rate of reversed combination among all pairs of instances, as defined below, 
is a good measure. We define reverse order and the right order rate. 
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reverse order and right order rate Suppose S and real PON are given for 
each group in a set of instances. Choose two groups gi and g -2 and let their S 
and PON values be Si, S 2 and Pi, P 2 , respectively. If Si > S 2 and Pi < P 2 , or 
Si < S 2 and Pi > P 2 , S of these two groups is in reverse order. Considering 
all combinations of two groups, the fraction of pairs for which the value S is 
not in reverse order is called the right order rate of the function. 

For example, if we have a set of instances of 23 groups, we get 253 pairs of 
groups. If 10 pairs are in reverse order, the right order rate is about 0.96. 



The Effect of the Weights Concerning the Degree of Dame We tested 
how the number of pairs in reverse order varies as we change the degree weights 
while keeping the kind weights constant as w e = Wt = w u . Table 0 to 0 show 
the number of pairs in reverse order when tci = 1 and W 2 ,W 3 , and W 4 change. 
The following properties can be observed: 

— The number of pairs in reverse order is 21 in the best case. This right order 
rate is about 92% seems insufficient for practical use. 

— The smaller the degree, the greater the effect, i.e. the effect of W 2 is greater 
than that of W3 and that of W3 is greater than that of W4. 

— w 4 does have an effect, but it is small. 

— If one of the weights of some degree is too small, the weights of other degrees 
can compensate it to some extent. For example, the optimum value of W 2 
becomes greater than 1 when W 3 = W 4 = 0. The value becomes greater than 
the optimum value (approximately 0.5) in the case where neither W3 nor W4 
is 0. 

— There is no sharply defined optimum. Under the condition of wi > W 2 > 
W 3 > W 4 , the value ranges W 2 = 0.4 to 0.6, W 3 = 0.2 to 0.3, W 4 = 0.2 to 0.3 
are almost optimal. 

In further analysis we found that the optimum values of weights are inversely 
proportional to the degrees. 



The Effect of the Weights Concerning the Kinds of Dame This experi- 
ment was designed to determine good weights for the kinds of dame. From the 
results of the preceding experiment, we set the values of Wi, ... , W 4 as follows. 



tol = 1 w2 = 0.5 w 3 = 0.35 wA = 0.25 

Table 0 shows the number of pairs in reverse order when w u = 1 and Wt and 
w e change. The following properties can be observed: 
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Table 1 . The number of pairs in reverse order when the weights concerning the 
degree change 



1 W4 = 0 
























W2 


\ W 3 


0 


0.1 


0.2 


0.3 


0.4 


0.5 
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43 


43 


39 


35 


30 


28 


25 


26 


25 


26 


24 
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45 


42 


36 


33 


29 


29 


28 


26 


25 


25 


29 




0.2 
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31 


30 


26 
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25 
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29 


28 
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39 
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32 


32 


28 


27 


26 


23 


27 


28 


28 
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32 


31 


28 


26 


23 


27 


28 


28 


30 
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31 


33 


30 


29 


27 


23 


23 


26 


28 


30 


30 




0.6 


32 


33 


30 


29 


26 


22 


24 


28 


28 


30 


30 




0.7 


32 


28 


30 


28 


24 


24 


28 


28 


30 


30 


32 




0.8 


29 


29 


28 


26 


25 


24 


28 


30 


30 


32 


32 




0.9 


28 


29 


27 


28 


25 


28 


30 


32 


32 


32 


34 




1 


24 


28 


26 


28 


25 


28 


32 


32 


32 


34 


34 




1.1 


27 


26 


28 


27 


31 


32 


32 


33 


34 


34 


35 




1.2 


27 


28 


28 


31 


33 


32 


33 


33 


34 


34 


35 




1.3 


27 


30 


30 


33 


33 


33 


33 


34 


35 


35 


35 


W 4 


= 0.1 
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0.6 
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26 


25 
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25 
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26 


29 


27 


25 


24 


24 


26 


28 




0.2 


38 


33 


27 


30 


26 


26 


25 


25 


27 


28 


28 




0.3 


33 


29 


30 


28 


27 


26 


23 


27 


28 


28 


30 
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31 


30 


30 


27 


27 


23 


25 


26 


28 


30 


30 




0.5 


32 


30 


27 


26 


23 


23 


26 


28 


30 


30 


30 
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30 


29 


27 


25 


22 


24 


26 


28 


30 


30 


30 




0.7 


28 


27 


26 


23 


24 


26 


28 


30 


30 


30 


33 




0.8 


27 


28 


24 


24 


24 


28 


30 


30 


30 


32 


34 
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26 


24 


24 


26 


30 


30 


32 


32 


34 


33 
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25 


26 


24 


28 


30 


32 


32 


34 


34 


34 




1.1 


25 


27 


25 


26 


32 


32 


32 


34 


34 


35 


34 




1.2 


27 


27 


25 


32 


32 


33 


33 


34 


34 


35 


34 




1.3 


27 


29 


31 


32 


33 


33 


34 


35 


35 


35 


34 



— The effect of w e is great, the effect of w t is not so great. 

— Weights can not compensate for the wrong setting of the other weight. This 
phenomenon is different from what is seen in the case of the degree weights. 

— The optimum weights are w e = 1.6 and wt — 0.2, and the number of pairs 
in reverse order is 5 for the optimum weights. The right order rate is about 
98%, which seems sufficient for practical use. 
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As described before, the closer to the group a dame is, the more effective it is 
for making eyes. Points with kosuri have a disadvantage. Dame on the edge are 
more effective because it is easier to make eyes there. The weights we found 
reflect these properties. 



Table 2. The number of pairs in reverse order when the weights concerning the 
degree change (Continued) 
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00 
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Table 3. The number of pairs in reverse order when the weights concerning the 
degree change (Continued) 
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Table 4. The number of pairs in reverse order when the weights concerning the 
kinds of dame change 
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Fig. 2. Linear summation of n-th dame vs. real PON 



3.3 PON Approximation Function 

Fig. 0 shows the relation between the real PON and 

4 

S = '^2'Wi ^2 W k d ik 

i= 1 fc=e,t,u 

that is calculated by the weights which were decided in the preceding subsection. 
Each point corresponds to one of our 23 instances. The right order rate of 98% 
shows that the right order is kept in all but three instances (denoted by the 
points j, n, and s). One can calculate an approximate PON easily by a function 
f(S). For example, 

f(S) = [0. 33S - 1.96] 

where [x] is the maximum integer not exceeding x, is a good approximation 
function. Preliminary experiments on further test instances seem to show that 
this function can estimate the correct PON in most cases. 
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4 Discussion 

As stated in 0 one can estimate PON by the number of n-th dame quite well, 
and we think that the method is promising. However, there remain misclassified 
instances as shown in Fig. 0 Let us discuss the method. 



4.1 Discussion of Experiments 

— We computed dame of degree up to 4 and the effectiveness of dame especially 
of degree up to 3 was proved. Dame of degree 4 are effective to some extent. 
It is not clear whether further dame are effective or not, but we guess that 
dame of degree up to 4 are enough for a rough estimate such as the one 
described here. 

— It was found that dame on the edge are very effective, while dame with kosuri 
are not. 

— Three of 23 instances (points j, n, and s in Fig. 0) were not correctly esti- 
mated. The reasons are as follows: 

Instance j The real PON is smaller than expected, because group k influ- 
ences group j. We treated j and k as separate groups, but the number of 
substantial dame of group j is smaller than computed. The dame in the 
lower part are not as effective as expected. 

Instance n The real PON is a little smaller than expected. This is because 
the shape of the group is bad. The group has spread to three directions 
and has a complicated form. Such groups are liable to have weak points. 
Instance ‘s’ The real PON is one greater than expected. This is a spe- 
cial situation near the edge and near another friendly group. With an 
opponent’s stone on the third line, usually a group cannot escape. But 
in this case, the friendly group immediately above the opponent’s stone 
helps with the escape even after an extra tenuki. Since our method does 
not take nearby friendly groups into consideration, this kind of error 
sometimes occurs. 

From these exceptional instances, we see that critical cases exist that other 
methods are necessary to get an exact PON in such cases. 



4.2 Discussions of This Method 

In real games, we can classify the elements determining the strength of groups 
as follows: 

life and death the possibility of establishing eye shape by a group itself; 
escape the possibility of escaping to the outside by finding a gap in the oppo- 
nent’s enclosure; 

counter attack the possibility of breaking through or capturing a part of the 
opponent’s enclosure; 

connection the possibility of connection to a near group of the same color. 
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However, only two of them, life and death and escape , are evaluated in the method 
proposed in this paper. This is a consequence of the simplicity of the calculation. 
Therefore, the following issues are topics for further research: 

— Friendly nearby groups 

As seen in instance ‘s’, the effect of such groups must be considered for 
more accurate estimation. For example, the difference in the number of dame 
between in the case of connecting two groups and in the case of being unable 
to connecting them has to be considered. 

— Distance 

The degree of dame considered in the current method is limited to 4. A 
greater number of degrees might be necessary for more accurate estimation. 

— Kinds of dame 

Three kinds of dame are distinguished in the current method. The effective- 
ness of the discrimination is clear from the experiments in Section 0 A finer 
discrimination might be effective. For example, dame points close to the edge 
are more effective than those in the center of the board from the viewpoint 
of making eye shape easily. And discriminating dame points by whether they 
are inside or outside an area enclosed by the group is possibly effective for a 
more precise judgement of life and death because inner dame points have a 
high possibility of constituting eyes. 

— Recognition of groups 

In this paper, we defined the life and death of a group considering only 
whether one can capture the whole group or not. In some situations, however, 
a part of a group becomes a target of capturing. Since capturing a part of 
a group is easier than capturing the whole group, the PON in such a case 
becomes small. For example, in instance d, one can consider a different group 
consisting of the leftmost stone only, if the opponent’s target is to capture 
only that stone. In this case, the PON becomes 2. Another example is that 
the group in instance j and the group in instance k could be combined into 
one group. 

This matter is related to the general problem of what set of stones should be 
regarded as a group. How one recognizes groups influences the calculation of 
PON. Recognizing groups appropriately is a precondition for the practical 
application of this method. 

— Intrinsic limitations 

Since the method is based on a simple calculation, very accurate estimation 
cannot be expected. In a word the major problem of the method is its narrow 
scope. Generally speaking the life and death of a group may depend on all 
stones around the group, or even the presence and the strength of groups 
located far from it. Especially in the case of semeai (mutual attack), the life 
and death of a group depends critically on all opponent stones surrounding 
the group and all own stones surrounding the opponent’s stones. These cases 
usually require more detailed analysis. Therefore, this method seems to work 
better in the case of open groups, which appear mainly in the opening and 
middle games, than tightly surrounded ones. 
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5 Conclusion 

The number of times that a player or the opponent can tenuki without changing 
a group’s status is a very important evaluation component in the game of Go. 
We defined the possible omission number, proposed a method that estimates this 
number by the number of n-th dame and their kinds, and made sure that one can 
estimate the possible omission number by our method with good accuracy. We 
analyzed the causes of remaining errors, and discussed the merits and possible 
improvements of the method. We think that the method is very promising for 
the evaluation of the strength of groups especially in the early stage of a game. 

The following are some of the remaining issues. As for the effectiveness, 
one should give consideration to the fact that the application is limited to some 
extent. Some modification considering the configuration and the strength of other 
groups around is needed to have more accurate estimation. And it is better to 
use search or patterns together or select them when needed. It is a practical 
manner to use this method in usual situation and use other accurate methods 
in special situations. How to use this and those methods properly should be 
studied. 

Of course the goal of the estimation of possible omission number is exact 
position evaluation that can help finding the best candidate moves. How to 
apply possible omission number to the goal is the next interesting problem. 
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Abstract. The use of Go terms while playing Go differs according to the 
player’s skill. We conduct three experiments to examine this in detail. 
In the first experiment, players’ spontaneous utterances (called proto- 
cols) were collected. We analyze these protocols in two ways. One is the 
number of Go terms used, and the other is the contents of the terms, 
such as strategic or tactical. The second experiment examines how well 
the players knew the configurations of the stones. From the two experi- 
ments, we find that even if the subjects know of many Go terms, their use 
depends on the subject’s skill. The third experiment considers “Soudan- 
Go,” where two players form a team. They are in the same room and 
can freely talk to each other; their spontaneous utterances (protocols) 
were collected. We also analyze reports of “Houchi Soudan-Go,” which 
is a Soudan-Go match between professional players. We find that expert 
players often use Go terms and they understood their partner’s inten- 
tions without needing a full explanation. Intermediate level players often 
talked over their plan and their opponent’s plan using many Go terms. 
From our analyses we developed a hypothesis which we call the iceberg 
model. The purpose of the model is to explain the structure of a term in 
the human brain from the viewpoint of the role of the term. Although 
this is still a hypothesis, it will become an important guide when carrying 
out protocol analyses and modeling the thought processes of Go players, 
keyword: Cognitive science, Go, Special terms, Expert knowledge, Ice- 
berg model 



1 Introduction 

Go is one of the most sophisticated two-player, complete information, board 
games in the world. In AI, the next grand challenge after chess is thought to be 
building a strong Go-playing program. Yet after more than 30 years of effort, Go- 
playing programs have achieved only human beginner level. There is undoubtedly 
more to be learned from actual human players but up until now there have been 
very few psychological and cognitive studies of Go-playing. Thus, we started 
a series of cognitive studies of Go-playing, using mainly traditional protocol 
analyses and an eye camera m m h mi. Go-players’ protocols in real matches 
have been gathered and analyzed. Our main purpose is to build a model of Go 
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players’ problem solving behavior. This model will help us to understand how 
humans cope with complex problems such as decision making in Go, and will also 
become a starting point in our further study of how humans acquire expertise 
in semantically rich and complex problem domains, such as Go. Also, the model 
may suggest a good approach to designing strong Go programs. 

Computer Go study began in the 1960’s, but Go programs are still at the 
beginner level. Although much effort has been made to create strong Go pro- 
grams, the methods are restricted to those where programmers’ introspection on 
Go was analyzed by themselves and the results were programmed into their Go 
program. 

Our purpose is to make a model of Go players, specifically of strong players, 
by analyzing expert players. Our previous studies showed that the mechanism 
used by experts in solving Tsume-Go problems correctly and quickly is to use 
hybrid pattern knowledge ns] uni H3- This knowledge is constructed by con- 
crete level knowledge such as patterns and abstract level knowledge such as the 
board situation and the conditions for pattern application. Also, we showed that 
conceptual knowledge had an important role even if subjects were beginners, 
by analyzing their protocols. Note that most programs use the approach of a 
pattern base or a search base[5J 0. Our protocol analyses showed that all sub- 
jects consider the current situation of the board or judge the next move by using 
the terms specific to Go. However, the role of Go terminology is still an open 
question. It is also unclear as to whether word role changes with the subjects’ 
skill. Therefore, this paper clarifies word role by examining how the subject’s 
skill influences the use of Go terminology. 

Section 2 explains the classification of Go terms proposed by Shirayanagi, 
which will be used later herein to classify protocols. Section 3 explains the results 
of protocol analyses in ordinary play. They show that the usage of Go terms 
depends on the player’s skill. Section 4 shows the experimental results collected 
on Go term usage from various players, except novices. From the results of the 
two sections, we focus on why players use Go terms in different ways according 
to skill. Section 5 describes experiments on “Soudan-Go,” where two players 
form a team and can talk to each other while playing a game. The results show 
that players of different level use Go terms differently. The same Go term carries 
different information according to the player’s skill. Section 6 introduces the 
“iceberg” model to explain these results. 

2 Shirayanagi’s Classification of Go Terms 

This study is based on collecting and analyzing Go terms. Go has more terms 
than other games such as chess or Shogi. Chess and Shogi have piece names 
whereas all stones identical in Go. One reason for the variety of terms created 
to describe the roles of the stones seems to be that the arrangement of stones is 
the key to success. Go has also tactical terms as does chess. First, we introduce 
the Go terms in the classification of Shirayanagi H3- The simple explanations 
of the terms are based on jT] as modified by us. 
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On the board 

Form and Position 

Posture It is the binary relation between the newest stone and side 
stone. 

— Narabi. A solid extension. 

— Kosumi. A diagonal extension. 

— Ikken-tobi. An on-space jump. 

Relationship It is the binary relation between the newest stone and 
enemy’s stone. 

— Nozoki. A peep. 

— Boushi. A capping move. 

— Tsuke. A contact play. 

Position 

— Komoku. Any of 3-4 points in the four corners. 

— Sazan. The 3-3 points in any of four corners. 

— Hoshi. A star point. 

Fuseki 

— San-ren-sei. Three star point stones in a row. 

— Syusaku-ryu. The Syusaku opening, characterized by 3-4 point 
moves rotating through three corners. 

— Mukai-komoku. A linear 3-4 point opening. 

Contents and Meaning 
Tactical moves 

— Oiotoshi. Capturing by creating a shortage of liberties through 
a series of sacrifices. 

— Uttegaeshi. A snapback. 

— Shicho. A ladder. 

Operation and Tactics 

— Shinogi. Saving an endangered group of stones. 

— Sabaki. Making light, flexible shape in order to save a group. 

— Kikashi. A forcing move requiring an answer. 

Evaluation and Judgment 
Configuration of stones 

— Aki-sankaku. An empty triangle. 

— Guzumi. A move that becomes the apex of an empty triangle. 

— Dango. A clump of stones. 

Move or stone 

— Aji-keshi. A move which eliminates the Aji in a potential. 

— Honte. A proper move. 

— Karai. A tight move, i.e., a strongly territory-oriented move or 
strategy. 

Go 

— Komakai-Go A game in which there are many groups with a lot 
of intricate situations scattered throughout the board. 

— Taisa. A big difference in the game. 
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Rule 

— Ko. A situation of repetitive capture. 

— Seki. A situation in which neither of two groups of opposing 
stones has two eyes. 

— Nakade. Playing inside a larger eye in order to reduce it to a 
single eye. 

Out of the board 

Psychology and Strategy 
Psychology 

— Tsurai. Painful move. 

— Ura-wo-kaku Outwitted opponent by doing just the opposite of 
what he expected. 

— Kiai Pumped up for the game. 

Strategy 

— Amashi. A strategy for White in a no-komi game in which he 
lets the opponent take good points but as compensation takes 
territory, aiming to outlast the opponent. 

— Oh-moyo. An especially large frame work of territory, potential 
but not actual territory. 



3 Protocol Analysis of Ordinary Playing Go 

We collected many protocols under ordinary play. Subjects were placed in sep- 
arate rooms, were asked to talk aloud when thinking and playing through a 
computer monitor. Table 1 shows the skill of each subject, condition of playing 
Go, the total number of moves and the amount of protocols transcribed in each 
game. The amount of protocols in a game can be very large (around 300 KB 
per game). In Table 1, novice means a subject who had just read a Go book 
for beginners. We call a kyu level player a ‘beginner’, an under 2-dan player 
‘intermediate’, over 3-dan player ‘advanced’. 

The protocols were analyzed in two ways. One was to measure the frequency 
Go term usage. We used the “Chasen” Japanese morpheme analysis system, 
and picked the terms. The other was to classify the contents of the utterance in 
each rough sentence unit and then count them up. Go protocols are naturally 
divided into subparts by the moves made by either side. Thus this is the ‘basic 
unit’ of the analysis. Each basic unit contains several sentences. These sentences 
were coded according to their contents. Multiple sentences were assigned to a 
single code from time to time. Table Q shows the main codes used. The most 
easily identifiable parts of the protocols were (N), (CM) and (L). We mainly used 
the plan, purpose, reason and evaluation parts of our protocols in the analyses 
reported below. 

3.1 Frequency of Go Term Usage 

Original Terms Formed by the Novices Very interesting results were found 
when examining the protocol usage of novices. Since a novice does not know Go 
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Table 1 . Collected transcribed protocols in ordinary play of Go. 



Match 

No. 


Black 

Rank 


White 

Rank 


Total 

Moves 


Result 


Protocol 

Size 


Remarks 


1 


4k 


2k 


287 


White win 


265KB 


ordinary game 
White wore Eye camera 


2 


novice 


novice 


184 


suspension 


70KB 


ordinary game 
Black wore Eye camera 


3 


2d 


4d 


88 


Black resigned 


247KB 


with interviewer 


4 


3d 


4d 


182 


Black resigned 


313KB 


with interviewer 


5 


Id 


4d 


152 


Black resigned 


182KB 


with interviewer 


6 


Id 


4d 


247 


Black won 
by 1 stone 


371KB 


with interviewer 
2 stones handicap 


7 


3k 


Id 


170 


Black resigned 


134KB 


with interviewer 
2 stones handicap 



terms, he/she made up original terms and used them. Some of the general Go 
terms were used, but not often. Indeed, very few Go terms appeared in the 
protocols, and those that did were elementary ones such as in Table 0 

In the 10th line of Table 0 under the Black side, he says “Ana-Futatsu (two 
holes)” (this means “two eyes”). This is an example of an original term. Also, 
he used other original terms like the following examples. 

That is a basic move, (long pause) That move is not understood, not 
understood. I will capture these stones. Woo. That is a stupid idea, isn’t 
it? Which move is the best? I can not judge that. A nursery song of 
the notched heart ( It is a phrase of famous Japanese song.) notches, 

I select it. 

The word “notches” reflects the shape shown in Fig. [0 The novice thought 
that it was an easy way to make an eye (the novice called it a hole) . 

Note that the terms created by the novices mainly described stone posture 
or configuration. 




Fig. 1. The shape for “notches” created by a novice. 
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Table 2. Table of main codes used in our analyses. 



Codes 


Meanings 


Additional explanation 


(UP) 


Understand purpose 


purposes behind a move 


(EP) 


Explain purpose 


local and global 


(EG) 


Global plan 


global plan involving whole board 


(N) 


Naming 


S(N): own move, 


(CM) 


Candidate move 


O(N): opponent’s move 


<L> 


Lookahead 




(SR) 


Reason of selection 


Why one chooses a certain candidate 


(P-R) 


Prediction and response 


A pair of prediction and response 


(VJ) 

(VM) 


Judge winning or not 
Evaluate move (local) 


global and local 
S(VM): own move, 
O(VM): opponent’s move 



Table 3. Top 10 Go terms used by novices (ranked by frequency). 



Order 


Black 


White 


1 


Torn 


(44) 


Utsu 


(78) 


2 


Tsugu 


(43) 


Oku 


(27) 


3 


Utsu 


(30) 


Toru 


(24) 


4 


White 


(18) 


Nuku 


(23) 


5 


Kakou 


(16) 


Black 


(14) 


6 


Black 


(15) 


White 


(13) 


7 


Dead 


(11) 


Dead 


( 7) 


8 


Atari 


(10) 


Tsugu 


( 6) 


9 


Cut 


( 8) 


Lose the game 


( 6) 


10 


Ana-Futatsu 


( 6) 


Katameru 


( 3) 



Rules Acquired by Novices The novices learned more rules heuristically 
while playing Go. For example, in a protocol: 

Probably, this group will not be captured if there are two holes. It is 
useless in case of being a corner. That group should not be captured if 
two groups are totally connecting. Therefore, here is a hole, and there 
is not a hole. There is another hole. It is useless to make a hole because 
this is a corner. 

In this protocol, the novice noticed rules, “if there are two holes in a group, 
then the group is alive (this means two eyes alive),” and “in case of a corner, that 
rule does not work (this means a false eye).” While playing only a single game, 
people can learn general rules and some heuristics as these examples showed. 
However novices did not generate any strategic terms. 
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The Characteristic of Terms Appearing in Protocols The volume of 
protocols used by novices was very small as shown in Table [Q novices could 
not report what they were thinking. They always said “what could I do.” The 
number of terms they used was very small, and many were original. Although 
beginners used more protocols than novices, they used only a few Go terms. 
Intermediate players used more protocols than beginners and they often used 
Go terms but only a limited variety of terms. Advanced players used a large 
number of protocols and many different terms were used. 

The terms were classified according to Shirayanagi’s classification. As a rough 
result, only “form and position” terms appeared in the protocols of novices. The 
original terms they invented were always “form and position” terms. Beginner 
players and intermediate players mainly used “form and position” terms, similar 
to novices. However, 3-dan or better players (advanced players) used “content 
and meaning” terms and “evaluation and judgment” terms. Especially, “opera- 
tion and tactics” terms in “content and meaning” terms frequently appeared in 
the case of their own turn after the middle game. 



3.2 Contents of Utterances 

Table H shows the results of an analysis of the frequency of appearance of con- 
tents in the protocols from matches 2, 1 and 3. Naming ((N)), which appears in 
most of the basic units, has been omitted from this table. 



Table 4. Percentage of basic units containing the following contents(%). 



Contents 


novice 
(match 2) 


beginner 
(match 1) 


intermediate 
(match 3) 


Purposes ((UP) (EP)) 


10.5 


16.2 


56.8 


Candidate move((CM)) 


0.8 


6.1 


41.0 


Lookahead((L)) 


4.6 


12.0 


18.2 


Judge winning or not((VJ)) 


0.0 


2.3 


6.3 


Reason of selection((SR)) 


0.0 


0.0 


2.3 


Global plan((EG)) 


0.0 


0.5 


5.1 



As for the novices, they used (EP) and (L) categories, and rarely used other 
categories. As for the beginners, they used all items except (SR). Increasing the 
rate of (UP) usage implies an increase in the level of “purpose”. 

As for the intermediate players, they used all items. Utterances of (UP), (EP) 
and (CM) were used much more frequently than was true for the beginners. This 
indicates that the intermediate players tried to understand the intention of the 
opponent’s moves, and so they examined their next move while being aware 
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of (CM). Reasons for choosing candidates were rarely explained by using Go 
termfQ. 

The protocol usage of advanced players was similar to those of intermediate 
players except (L). (N), (VM) and (P-R) usage increased, although it does not 
appear in Table 4. Increasing (P-R) usage indicates that the advanced players 
may have typical sequential patterns. 

In summary, advanced players think about their own purpose and their oppo- 
nents’ purpose (intention) at first, only then did they generate candidate moves 
and look ahead. Novices and beginners rarely judged whether they were winning 
or not, and they seldom explained the reason for move selection nor proposed a 
global plan. Purpose (both of his own and of his opponent’s) seemed to be the 
main concern of all players. 



4 Do Weak Players Know Go Terms? 

The experiments in Subsection IH. II showed that weak players (beginners and 
intermediate players) use a limited range of Go terms. We carried out two ex- 
periments to investigate whether the weak players knew Go terms or not. 

First, we did a vocabulary test of Go terms, where subjects explained Go 
terms by placing stones on a board. As a result, we found that weak players have 
less knowledge than advanced players and that there was positive correlation 
between the vocabulary test and their skill. 

Next, in order to see whether weak players really did not know Go terms, 
we conducted a recognition test, wherein subjects provided the Go term for the 
situation they were shown. This experiment was carried out on two subjects, 
whose results in the former were bad. As a result, we found that the subjects 
could easily recognize the situation using Go terms, even if they were not strong 
players. 



4.1 A Vocabulary Test of Go Terms 

The method for this experiment is as follows. A Go term was presented in the 
upper left corner of a monitor. The subjects then placed stones on the Go board 
displayed in the center of the monitor or explained the meaning of the term. 
Two advanced players, an intermediate player, and a beginner were examined. 
One hundred of the Go terms that are often used in Go books or commentaries 
were shown. 

As a result, it was proved that there was a difference in the knowledge of 
Go terms according to subjects’ skill. Table 0 shows the percentage of correct 
answers. Abstract terms such as “thickness” or “ katachi ’ have more than one 
“correct answer”. When a reasonable explanation was given, we judged it as 
“correct answer (knowing the term)”. “Wrong answer” was awarded only when 

1 One reason why they explained the reasons may be because an interviewer sat beside 
the player during each game. 
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the answer was “don’t know” or nothing could be explained at all. Even with this 
loose criterion, Table 0 shows that the number of terms known depends on the 
skill: the better one’s skill is, the more terms one knows. While the difference 
of skill between beginner and advanced is huge, the result is not significantly 
different. We think that there are two reasons for this. One is that we used very 
famous terms in the test. Another is the loose criterion used. The protocols of 
the experts were more accurate than those of the beginners. 



Table 5. The correct rate of a vocabulary test of Go terms. 



level 


correct rate 


beginner 


66% 


intermediate 


42% 


advanced 


92% 


advanced 


93% 



4.2 A Recognition Test of Go Terms 

This experiment was conducted as follows. Some specific board situations were 
presented on a monitor in front of the subject. Sometimes a static configuration 
of stones was presented, and sometimes a sequences of moves was displayed. 
Each situation had a proper name. For example, the numbered stones in Figure 
13 were sequentially added to the board, and the subject was asked the name of 
the sequential move. The recognition test was held at least one week after the 
vocabulary test. 



Table 6. The rate of correct answer for board situation recognition experiment 



Subject 


No. of 
Problems 


No. of effe- 
ctive problems 


No. of co- 
rrect answers 


Correct 

answer 


Correct rate for 
original 100 words 


beginner 


37 


34 


18 


53% 


83% 


intermediate 


46 


46 


27 


59% 


79% 



The procedure is explained in detail below. When the initial board situation 
was presented, we asked the subject what would be his choice for the next move. 
Sometimes the subject used the term expected when he explained his choice. 
After showing the sequence of moves, we asked the subject what he would call 
such a sequence of moves. This procedure examined whether the Go terms were 
developed from the board situation. The board situation recognition experiment 
was given to beginners and intermediate level players. 
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ABCDEFGHJ KLMNOPQRST 
19 
18 
17 
16 
15 
14 
13 
12 
11 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 




19 

18 

17 

16 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 



ABCDEFGHJ KLMNOPQRST 



Fig. 2. Example of board situation presented to subjects (example of “Karami- 
zeme” ) 



The results of the experiment are shown m Table|ni Note that a loose criterion 
for the correct answer was used. For example, when the subject answered only 
“Harazuke” instead of “Itachi-no-harazuke” , we judge the answer as correct. 

The problems presented in this test are those for which the subjects failed 
to make correct answers in the vocabulary test. 0 

“The correct rate for original 100 words” is calculated by summing the num- 
ber of correct answers in this experiment and the number of correct answers in 
vocabulary test in Section 4.1. The result shows that the recognition rate does 
not change according to the skill very much. 

5 Protocol Analysis of Soudan-Go 

The protocol analysis of ordinary play showed that there was a difference in 
using terms and in the content of the subject’s introspective protocols according 
to skill. In the above recognition test, there was no big difference in the ability 
to recognizing the terms from a board situation according to skill. Why is there 
a big difference in using Go terms in the protocol, though they have the ability 
to use the terms? We analyzed the “Soudan-Go” protocols in order to identify 

2 Some problems which were not presented in the vocabulary test were presented in 
this test to the beginner. 
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Table 7. Collected Playing Soudan-Go Protocols 



Match 

No. 


Black 

Rank 


White 

Rank 


Total 

Moves 


Result 


Protocol 

Size 


Remarks 


8 


4d 


4d 


140 


Black resigned 


205KB 






Id 


3d 










9 


4d 


4d 


177 


White resigned 


158KB 






Id 


Id 










10 


4d 


4d 


91 


White resigned 


89KB 


13x13 




2k 


Id 











Table 8. Houchi Soudan-Go Protocols 



Match 

No. 


Black 

Rank 


White 

Rank 


Total 

Moves 


Result 


Protocol 

Size 


Remarks 


11 


M. Kitani 6P 


T. Suzuki 7P 


245 


Black won 


1.5MB 


No 




S. Go 5P 


K. Segoe 7P 




by lstone 




Komi 


12 


K. Segoe 7P 


T. Suzuki 7P 


286 


White won 


2.2MB 


3.5 poiint 




S. Go 6P 


M. Kitani 7P 




by 8.5 stones 




Komi 


13 


K. Iwamoto 6P 


M. Kitani 7P 


190 


Black 


1.5MB 


3 point 




U. Hashimoto 6P 


C. Maeda 6P 




resigned 




Komi 



the reason. Soudan-Go is a game between two groups, each of which consist of 
two players, who can talk to each other freely. The combination of the groups is 
shown in Table □ 

Also, we analyzed “Houchi Soudan-Go” protocols to examine the word usage 
of expert players. The protocols were taken from printed records as reported in 
the Houchi Newspaper. While there is no guarantee that the original utterances 
were reproduced exactly, we found that utterance category was. Therefore, we 
only analyzed the utterance part of the report. Table |5] shows the list of Houchi 
Soudan-Go. 



5.1 Analysis of Soudan-Go 

By comparing Table El to Table 0 we find that advanced players do not speak 
more in explaining their selection than intermediate players. One reason is that 
advanced players made very few candidate moves, usually only one. Another 
reason is that advanced player’s thinking is compactly conveyed to his partner 
by the use of appropriate terms. One example is shown below. 

[After black plays at 53-th move] Blackl: ... if white plays kake , we play 
keima , (Black2: Yeah) and if white plays kake again. (Black2: Yes) 
What do we do? We have other potential power, so this black can live 
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Table 9. Percentage of basic units containing various contents(%) 



Contents 


advanced 
(match 9) 


Purposes((UP) and (EP)) 


54.0 


Candidate move((CM}) 


25.1 


Lookahead((L}) 


3.9 


Judge winning or not((VJ}) 


1.4 


Reason of Selection ((SR)) 


0.6 


Global plan((EG}) 


3.0 



easily, (Black2: yes) (Long pause) [white plays at 54-tlr move], Ah, here 
they come. Black2: Ah, they played. 

In match 5, we explicitly asked players to explain their candidate moves and 
their reasons for their selections; they could explain their thinking using short 
Go terms. 

The following example shows that naming, which means to identify the board 
situation, is very important in recognizing the board situation and selecting a 
strategy. “ Kake ” in following example implies that opponent should answer the 
keima. We found that terms were charged with various meanings. 

In Soudan-Go match (match 8), after the game, two parties came to 
the same room, and they discussed what was good and what was bad. 
Finally black lost the game, and they attributed the cause of their loss 
to white’s 54-tlr move and 56-th move. In the protocol, black refers to 
these two moves as “two kakes in the center” , while white refers to them 
as “two keimas’’’ . 

Blackl: ...We estimated the situation too optimistically, we overlooked 
the effect of kake. Experimenter: Where is it? More towards the be- 
ginning? Blackl: It was a kake. Black2: White’s kake in the center. 
Blackl: That two kakes in the center. We know that white wouldplay, 
but we did not take it seriously. Black2: We said “not so serious”. 
Whitel: Ah, those two keimas ? Blackl: Yes, yes. Black2: Yes, yes. 
Experimenter: Ah, these moves. Blackl: We underestimated their ef- 
fect. 

Kake and keima were used in that situation. Because the black players felt 
that white stone was superior to the black one and the white stone oppressed 
the black one, they called the situation kake. Keima is just a posture category 
term. 
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5.2 Analysis of Houchi Soudan-Go 

Houchi Soudan-Go was held from 1936 to 1938. The sponsor was the Houchi 
newspaper company. The aim of the series was to present the professional players’ 
thinking to citizens. 

One example is shown below. This example is the opening of a game(match 
13). The professional discussed the local strategy. 

Kitani: I think opponent should place a stone at a star point or the 
komoku point. Woo. Probably a star point. If they select a star point, 
most possible star point is at the upper right corner, then we select the 
diagonal star point, don’t we? Maeda: But star point is too simple. 
Shall we select diagonal oh-takamoku point? Kitani: If we adopt your 
suggestion, the opponent should place a stone at star point. Then we 
have to select kosumi position to complete. Then that move makes this 
game very busy, I dislike the move. Of course we can continue to play 
after selecting the move, but star point may not cause our errors, I think. 
Maeda: The second move depends on the opponent’s move. What is the 
third move of the opponent? Takamoku was tested before Soudan-Go, 
so they change the place. Probably, they select star point or komoku at 
the lower right corner. Kitani: That is the next consulting subject just 
after we watch the opponent’s moyo. Anyway, we decide if the opponent 
select the upper right star point, then we select diagonal star point. 

This example means that even when an expert player used only position 
terms they implied judgment and planning. For example, Kitani said “&wsy” in 
the middle of the quotation. “Bwsy” means the judgment of the future situation, 
not current. They shared the same future board situation using only two words 
such as oh-takamoku and kosumi. So the term oh-takamoku has the feature of 
the plan to make future image in this situation. 

In the middle game, they often discussed evaluations. For example, “that is 
a solidly built move” and “and then opponent selects the kosumi, which causes 
unbelievable situation” . That means (VM) is used more often in their utterance 
than in advanced players’. That is, professional players’ utterances suggest or 
predict move(s) and evaluate them. This form is similar to that of advanced 
players. 

Although, both advanced players and professionals explain their evaluations, 
the level of the explanation differs. While professionals use terms such as “unable 
to escape,” or “becoming a decisive battle,” advanced players say just “connect- 
ing” or terms in “form and position” category of Shirayanagi’s classification, a 
quite different response. 

6 Discussion 

The recognition test showed that all subjects except novices have the ability 
to recognize board situations in Go terms. However, the results of the protocol 




Relations between Skill and the Use of Terms 



295 



analysis of ordinary Go showed that the use of the terms in the protocols deeply 
depends on the level of skill. There is also a difference in the utterances. Inter- 
mediate or better players can discuss purpose, evaluation, and planning. One 
of the differences between intermediate players and better players is in the use 
of utterances to explain selections. Intermediate players gave detailed explana- 
tion, while professional players used abstract reasoning terms, such as “busy” 
or “quick” . Beginner and intermediate players evaluated their moves using sen- 
tences, while advanced players made evaluation by using Go terms. Professional 
players evaluated their choice by using Go terms and abstract words. Further- 
more, professional players used form and position terms in order to convey their 
purposes and ideas to their partner. 

Therefore we have to explain the role of terms. There are three constraints. 

— Even form & position terms can describe the purpose. 

— When a subject looks at the board, he/she is conscious of purpose and future 
image expressed through the Go terms. 

— Unskilled players cannot use Go terms like advanced players. 

We propose that the iceberg model satisfes the above constraints to explain how 
the differences in the usage of terms in utterances accords to the skill of the 
player. 



6.1 Iceberg Model 

The iceberg model (Fig 0 attempts to explain the observed differences in the 
usage of Go terms. Each Go term is an iceberg. The tip of the iceberg lies above 
the surface of the water and the next lies below the surface. The upper part can 
be easily observed, while the lower part is more difficult to directly observe. 

Each Go term has a lexical meaning which has a dictionary description and 
is easy to define, such as a configuration of stones, i.e. This is the part above the 
surface of water. All subjects except novices understand the lexical meanings of 
Go terms as shown by the recognition test. However, we think that the terms 
have various meanings other than the lexical meanings, as was observed in the 
protocols. For example, when someone says “the opponent puts the stone at the 
position of keima , which is cutting, and then we select deru ” , “ Deru ” means not 
only going through the opponent’s wall, but also taking advantage. The meaning, 
“taking advantage”, emerges from the interaction of the term and the current 
situation of the board. In other words, we think that the term itself may suggest 
the emergent meaning by referring to the current situation of the board. The 
emergent meaning is the part below the surface of water. 

Advanced players explained their reasoning at an abstract level using only 
Go terms. In the case of Soudan-Go, neither professional nor advanced play- 
ers used concrete terms to explain their reasoning. Although the terms could 
be interpreted in different ways, misunderstandings did not occur amongst the 
partners, as was observed that they agreeded the following discussion. On the 
other hand, intermediate players clearly explained their reasoning in detail us- 
ing concrete words. Professional and advanced players used only the Go terms, 
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Term 




Fig. 3. Iceberg Model 



which include the meaning of intermediate player’s explanation. When an ad- 
vanced player says a term to the other advanced player, the emergent meanings 
of the situation or values are also conveyed to the partner. Emergent meanings 
of terms differ according to the skill. 

Let us consider frame representation in AI to explain emergent meaning. Im- 
age that each iceberg consists of slots. When a player understands the board 
situation, he fills the slots by extracting information from the board. One prob- 
lem exists: whether all slot names are decided beforehand or not. Let us consider 
the following examples. When a player’s situation is bad and he says “opponent 
selects mage, then I select hane, then he selects hane ” , hane of one’s own play 
implies endurance or defense. When the situation is good, however, it implies 
offense. In short, the meaning of the terms used depends on one’s situation. 
However, we suppose that there are not so many meanings in a term and that 
a class of slots may be decided beforehand for a player’s level, even though the 
class of slots differs enormously between different level players. In other words, 
slot names are not defined beforehand, but the class of slots is defined. This flex- 
ibility is the source of the emergent meaning. In the original concept of frame 
representation, all the slots should be defined beforehand. This is the difference 
between the traditional frame representation and our model. 

For the submerged part of the iceberg slots are filled only when the subject 
looks at the board. Those slots consist of on-the-fly knowledge. So, when a 
subject expressed the term to his partner, the partner has to construct the 
submerged part by looking at the board. Only when they have same slot names 
in this part, they can understand each other. 




Relations between Skill and the Use of Terms 



297 



The size of the iceberg depends on the player’s skill. Advanced players have 
larger icebergs and the submerged part become larger than those of intermediate 
players. This is the result of increasing the number of slots. 



6.2 Comparison of Iceberg Model with Template Theory 

A model similar to the iceberg model was proposed by Gobet and Simon|5|. It 
was called “template theory.” The model was introduced to explain how mem- 
ory chunks @ evolve into templates. That is the big difference from our iceberg 
model. However it is interesting that the structure is very similar. The template 
also consists of frames. The main difference in the structure of the iceberg model 
and template theory is the slot. The slot of the iceberg model is only approxi- 
mately decided, while the slot of the template theory is predefined. Of course, 
flexibility is possible in the template theory by choosing from among various 
kind of slots. The difference reflects the differences between Go and chess, which 
are described in the following two examples. 

Our previous study showed that expert players hold the board image as 
a mixture level description jl Rj. The mixture level description was reported to 
be a uni-space structure. That is, stones involved with an area of focus are 
clearly remembered, while other areas are unified at an abstract level. More 
precisely, when advanced players had enough time to observe and recall the board 
situation, they could reconstruct the board by recalling the relationship between 
stones. When recall was performed under time pressure, they only recalled stones 
that were associated with important areas, and other areas were represented 
using features, such as “black area” or “white is stable.” And which stone can 
be recalled does not depend on the distance from the focused area. Even if the 
board was small, such as a 9 by 9 board, the effect was the same. Another 
of our studies showed that human players recalled the board situation in the 
sequence order even when they only observed a static pattern|12J. In recall, they 
often verbalized form and position category words in Shirayanagi’s classification. 
Therefore, advanced players could easily find sequential meaning. These two 
studies indicate that human memory about static Go boards is based on uni- 
space of the meaning including the concrete shape of stones. 

It is well known that advanced Chess and Shogi players can easily reconstruct 
a board by hearing the game record. The same is not rue in Go even with a 9 
by 9 board. The game record of Go consists of stone color and position. The 
imaging task is made easier if Go terms was well as the game record are given 
to the olaver [HI . Specifically, the correct imagining rate is high when both form 
and position category terms and content and meaning category terms are given 
in addition to the game record. 

The above discussion indicates that for humans to perceive the board situ- 
ation of Go, they must invest each configuration of stones with a role. In case 
of Chess or Shogi, the role of each piece is already given by the definition of 
movement. So, in Go, more basic level perception is necessary than chess. There- 
fore in Go, human should invest the configuration of stones with multiple roles, 
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described by multiple terms. Thus we can not put the static relation of configu- 
ration of stones and basic function beforehand. Player should assign a function 
to the configuration of stones dynamically according to the board situation. 

Intermediate players have a structure of knowledge quite similar to that of 
the template theory. Advanced and professional players, even if they use position 
terms, assign to the term information about the player’s plan, strategy, judgment 
and so on, all of which are strongly dependent on the situation. Plan and strategy 
sometimes are contained together in the same term. Accordingly, they do not 
exist as independent slots. 

7 Conclusion 

We analyzed the Go protocols expressed during ordinary play and Soudan-Go. 
We carried out experiments to see how many Go terms were known to a wide 
range of subjects. As a result, all subjects, except the begineers, knew the Go 
terms, but term usage differed with the subject’s skill. The iceberg model was 
proposed in order to explain the results. 
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Abstract. Recently, a number of programs have been developed that 
successfully apply variable- depth search to find solutions for mating 
problems in Japanese chess, called tsume shogi. Publications on this re- 
search domain have been written mainly in Japanese. To present the 
findings of this research to a wider audience, we compare six different 
tsume programs. To find the solutions of difficult tsume-shogi problems 
with solution sequences longer than 20 plies, we will see that variable- 
depth search and hashing to deal with a combination of transposition, 
domination and simulation leads to strong tsume-shogi programs that 
outperform human experts, both in speed and in the number of prob- 
lems for which the solution can be found. The best program has been 
able to solve Microcosmos, a tsume-shogi problem with a solution se- 
quence of 1525 plies. 

Keywords: Variable-depth search, best-first search, conspiracy-number 
search, game playing, tsume shogi 



1 Introduction 

Most work on game-tree search has focused on algorithms that make the same 
decisions as full-width minimax search to a fixed depth uni- Examples are alpha- 
beta pruning and SSS* SSQj. Human players do not use full-width fixed depth 
search, but a combination of shallow search and deep search [5J. A number of 
algorithms have been proposed that perform variable-depth search. For example, 
conspiracy numbers pH 1 ! Eli, singular extensions p|. proof-number search mm 
and best-first minimax PI are all algorithms for searching game trees without 
explicit bounds on the search depth. 

One of the domains where search to variable depths has been very success- 
ful but where this success has been almost unnoticed by the international AI 
community is tsume shogi. Tsume shogi are mating problems in Japanese chess. 
Since the early 90s, several tsume-shogi programs have been developed that can 
quickly find the solution of problems with solution sequences of more than 50 
plies. Both on the development of strong tsume-shogi programs and the char- 
acteristics of the programs, there are a number of publications in Japanese 
(0 Q 0, 9, Em E3 3, E& B23)- There have been a few English publications 



H.J. van den Herik, H. Iida (Eds.): CG’98, LNCS 1558, pp. 300- 131 Y I 1999. 
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on tsume shogi, but only Seo m and Kawano M give a description of a pro- 
gram for finding the solutions to difficult tsume-shogi problems Q Some features 
of a tsume-shogi program can be found in m , but the description is very brief. 

In this paper, we would like to present the results of tsume-shogi research to 
a wider audience. In Section 2 a short explanation of the rules of tsume shogi 
are given, along with its history and the relevance for a shogi-playing system. In 
Section 3 the computational features of a program to find solutions for tsume- 
shogi problems are given. In Sections 4 to 9, a number of tsume-shogi programs 
are described. Also, their results on different test sets of tsume-shogi problems 
are summarized. We end with some conclusions and thoughts on future research 
on a general shogi-playing system using methods discussed in this paper. 



2 Tsume Shogi 

2.1 Rules of Tsume Shogi 

Tsume shogi are mating problems in Japanese chess. As far as the rules of 
Japanese chess are concerned, to understand the contents of this paper it is 
sufficient to know that shogi is similar to chess. The aim of the game is the same 
as in chess, namely the mating of the opponent’s king. The shogi board is slightly 
bigger than the chess board, 9x9 instead of 8x8. Some pieces in shogi are the 
same as in chess, like the rook and the bishop, but some pieces are different. 
There is no queen in shogi, but instead there are golden generals, silver generals 
and lances. Promotion is also a little different. Most pieces can promote and can 
do so on any of the top three ranks of the board. The most important difference 
between shogi and chess is that in shogi captured pieces can be re-used. A piece 
captured becomes a piece in hand for the side that captured it. When it is a 
player’s turn, he can either play a move with a piece on the board or put one 
of the pieces previously captured back on a vacant square on the board (this is 
called dropping a piece). It then becomes his own piece. Most drop moves are 
allowed, even dropping with check or mate is legal. Finally, it should be noted 
that in shogi the player to move first in the starting position is black and the 
other player is white (in chess this is the other way around). For a more detailed 
comparison of chess and shogi, see P3- 

The rules of tsume shogi are simple. The goal of the attacking side (black) 
is to mate the king by consecutive checks; the goal of the defending side (white) 
is to reach a position where the attacking side has no checks. Therefore, the 
attacking side has to give check at every move and the defending side has to 
defend against these checks and prevent mate as long as possible. 

2.2 History 

Tsume shogi has a long history. The first tsume-shogi problems date back to 
the 17th century H2I and the collection of tsume-shogi problems that is still 

Seo’s work is described in a Master’s thesis, so not easily available. 
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considered to be the most brilliant ever, has been published in 1755(!) and were 
composed by Ito Kanju EJ. Also, there are many books with collections of 
tsume-shogi problems and all shogi magazines have tsume-shogi problem corners. 
There is even a monthly magazine called Tsume Shogi Paradise dedicated to 
tsume shogi. Of course, in tsume-shogi problems that are published in shogi 
magazines the artistic element is very important, just as in chess problems. 
Recently, there have been attempts at automatically composing tsume-shogi 
problems with artistically appealing features 0. 

2.3 Relevance for Shogi Game Playing 

Tsume shogi is not the same as perfect endgame play. It is possible that a game 
can end in fewer moves than by consecutive checks. However, tsume shogi is 
very important in the shogi endgame. In the endgame, the number of pieces in 
play is the same as on the first move. Furthermore, dropping pieces can have 
a major impact on the strength of attack or defense. Therefore, mate is the 
prime objective and resignation because of material deficit is rare. Usually a 
player resigns when he can no longer avoid mate, either because the opponent 
has started a tsume sequence (continuous checks leading to mate) or if there is 
no defense against such a tsume threat. The shogi endgame is a mating race, so 
finding mate and realizing that the opponent is threatening mate is vital for the 
endgame strength of a shogi player and also for a shogi-playing program. 

2.4 The First Tsume-Shogi Program 

The first tsume-shogi program was built by Ochi in 1968 (described in PH). 
Ochi’s program has been reported to find the solution of tsume problems with 
a solution sequence of 9 to 11 plies as quickly and accurately as human players. 
Even though the program ran on one of the fastest computers of its time, this 
is quite an incredible result for a program that is 30 years old. We have been 
unable to find the original paper with a description of the program and the 
supporting data for this claim. In any case, for 25 years there was no break- 
through in the development of tsume-shogi programs that made it possible to 
find the solution of problems with a solution of 15 plies or more. This changed 
with the introduction of algorithms for searching to variable depths in the early 
1990s. 

3 Computational Features of Tsume Shogi 

A correct tsume-shogi problem should have only one solution. The search tree 
for a tsume-shogi problem is an AND/OR tree, or a minimax tree where the 
evaluation of every node can have only two values, TRUE or FALSE. At each 
OR-node it is sufficient to find one check for which all the defenses lead to mate. 
If there is one check that leads to mate from the root, the problem is solved and 
the search can be stopped. 
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A tsume-shogi program to search an AND/OR tree has to deal with several 
problems: 

— search deep for long mating sequences 

— avoid redundant search 

— recognize that positions have the same mating sequence 

— decide which moves to search first 

We will now describe each of these problems in detail. 

3.1 Problem 1: Long Solution Sequences 

Tsume-shogi problems have a solution length ranging from 3 plies to hundreds 
of pliefl Seo |ZBj has experimentally found that the average branching factor of 
tsume shogi is only about 5. This is much smaller than the average branching 
factor of normal shogi play, which is about 80 m However, even with this small 
branching factor, it is difficult to find the solution of tsume-shogi problems with 
solutions of more than 15 plies by brute force methods. Currently the tsume-shogi 
problem with the longest solution sequence is a problem called Microcosmos, 
which has a solution of 1525 plies. Finding long solutions of tsume-shogi problems 
is the first challenge for tsume-shogi programs. 

3.2 Problem 2: Avoiding Redundant Search 

To avoid searching the same position from different parts of the search tree, 
usually a standard transposition table is used. In shogi, not only transposition, 
but also domination can lead to redundant search. Domination is a concept that 
is used in every tsume-shogi program, but it has not been properly defined in 
the literature. We define it as follows: 

Definition 1 A position P is dominating position Q if the board positions of P 
and Q are the same, and the pieces in black ’s ( white 's ) hand of P are a proper 
superset of the pieces in black’s (white’s) hand of Q and it is black’s (white’s) 
turn to play in both P and Q. 

In tsume shogi, if P dominates Q regarding the attacker’s pieces and a mating 
sequence has been found in Q then this mating sequence will also work in P. 
One way to check this, is to store extra information about the pieces in hand in 
the hash table. 

3.3 Problem 3: Finding Simulating Positions 

In chess, the number of checks and defenses against these checks is very limited. 
In tsume shogi the possibility of dropping pieces greatly increases the number of 
possible moves and makes the problem much harder. For example, in Diagram 
1, the defending white king on la is checked by the black rook on 9a[]. The white 

2 A position where there is mate in one move is trivial and not considered a tsume-shogi 
problem. 

3 The possibility of dropping pieces makes the total number of pieces in any shogi 
position the same (40). In Diagram 1 only the relevant pieces are shown. 
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pawn on lb blocks the king’s escape to lb. A lance in shogi is a piece that moves 
like a rook but only in the forward direction. The black lance on 2i therefore 
blocks the escape of the king to 2b and also covers 2a. In a similar chess position, 
this position would be mate. 
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Diagram 1 . An example of interpositions 



However, in this shogi position the defense has all the pieces on the left side 
of the board in hand. The defending side can therefore drop any of these seven 
pieces between the checking rook and the king. Since there are seven vacant 
squares between rook and king and seven different pieces to drop, the number 
of interposing defenses is 49. 

A rule of tsume shogi is that drops which do not change the solution sequence 
but only increase its length are not considered proper defenses. These useless 
interpositions are not counted as moves in the length of the solution sequence. 
Since all the interposing moves in Diagram 1 are useless (black can take any 
interposing piece immediately with the rook) , this position is mate according to 
the rules of tsume shogi. For a tsume-shogi program to recognize when drops are 
useless for check or defense with as little search as possible is a non-trivial task. 

Another complication of tsume shogi is that in most cases promotion of pieces 
is optional. For tsume shogi, this usually leads to the same move sequence, since 
the moves of the promoted piece are in general only slightly different from the 
moves of the unpromoted piece. However, even in the case where the moves of 
a promoted piece are a superset of the moves of the unpromoted piece, there 
are some special cases where promotion of a piece does not lead to mate, while 
the non-promotion of a piece does. As a result, a tsume-shogi program has to 
search the moves where a piece promotes and also the moves where the piece 
does not promote. Since in shogi a piece can promote on any of the top three 
ranks, this considerably increases the number of moves to be searched. A tsume- 
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shogi program should avoid redundant search for optional promotion moves as 
much as possible. 

To deal with the problems of useless interpositions and optional promotion, 
the concept of simulation has been introduced. The general definition of simu- 
lation is m- 

Definition 2 A position P simulates position Q if all move sequences that can 
he played from position P can also be played from position Q. 

For example, the moves of a promoted rook are a superset of the moves of an 
unpromoted rook. If P is a position with an unpromoted attacking rook and Q 
is the same position with a promoted attacking rook, then P simulates Q. For 
tsume shogi, it would seem logical that if there is a mate in position P, there 
is also a mate in Q. However, in the previous section it was already mentioned 
that there are some special cases where a position with a promoted rook does 
not lead to mate, while a position with an unpromoted rook does. 

Also, moves may have the same meaning even though they are not exactly 
the same. For example, a mate in position P is found and next position Q is 
searched where the only difference between the two positions is that the starting 
square of the rook is shifted one square to the left or right. This is a case where 
P might simulate Q. In these cases, it is natural to try the mating sequence in P 
first. However, since the starting square of the rook move differs, these positions 
do not simulate each other according to the strict definition given above. 

Therefore, usually a more general, but heuristic concept of simulation is used 
in tsume-shogi programs. Instead of searching for simulating positions where the 
move sequences must be the same, some shogi dependent knowledge is used to 
look for positions where the move sequences are likely to be the same. If the 
current position P might simulate a position Q based on these heuristics and a 
mate from P has been found, then the mating sequence found in P is tried first 
in Q. In tsume-shogi programs the following heuristics are used for simulation: 

— promotion vs. non-promotion 

— different starting squares for the same long range piece (rook, bishop and 
lance) 

3.4 Problem 4: Most Promising Moves First 

The final challenge for a tsume-shogi program is to guide the search for a solu- 
tion in the right direction. Expert human tsume-shogi solvers are very good at 
selecting promising candidate moves from a position. Usually the first or the sec- 
ond move considered in a position is the move leading to mate. To avoid wasting 
time by searching moves that are not likely to lead to mate, move ordering is 
very important. 

Now the problems a tsume-shogi program has to deal with have been dis- 
cussed, we will give a description of six tsume-shogi programs: 
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— Noshita’s Tl, T2 and T3 tsume-shogi programs 

— Ito’s tsume-shogi program 

— Kawano’s tsume-shogi program 

— Seo’s tsume-shogi program 

These are the strong tsume-shogi programs for which a detailed description 
of the methods used have been published. There are other strong tsume-shogi 
solvers, but these are part of commercial shogi software and there have been 
no publications on them. Especially famous in this category is a program called 
Morita Shogi. Results of this program m show that it might be just as good 
as the programs described here. It would be very interesting to know if this 
program uses different methods for finding solutions in tsume-shogi problems. 



4 Tl: Iterative Alpha-Beta with Selective Deepening 

4.1 Method 

In 1991, Noshita’s Tl program m became the first program to use selective 
deepening in combination with iterative alpha-beta search to be successful find- 
ing solutions of tsume-shogi problems with solutions longer than 11 plies. Tl 
uses alpha-beta iterative deepening with only limited selective deepening. The 
selective deepening is based on the heuristic of measuring the freedom of the 
king. After each move, for each of the eight squares adjacent to the king it is 
calculated if the king can move to this square or not. If the king is very limited 
in its movement, the position is assumed to be close to mate and the search is 
extended for a maximum of 4 plies to try and find a mate. 

Tl puts a lot of emphasis on move ordering. There are no less than 60 
criteria for move ordering. Examples are: ordering promotions higher than non- 
promotions, preferring king moves away from the attacking pieces and ordering 
drops higher the closer they are dropped to the king. Noshita also used parameter 
learning to tune these move ordering criteria. 

Tl has some heuristics to deal with transposition and domination, but has 
no hash tables. To avoid useless interposing drops, Tl uses the definition of the 
shogi programmer Kakinoki m- 

Definition 3 (i) An interposed piece is useless if capturing the piece results 
in a mating position, (ii) Suppose the king is mated in N moves in a certain 
position P without any interposing drop between checking piece and king. Then 
an interposed piece in P is useless if it is immediately captured, and there is a 
mating sequence of length N after the capture and the captured piece is not used 
in the mating sequence. 

In Tl this is applied as follows. If a piece is dropped between the checking 
piece and the king on square S and can be captured immediately, this might be 
a useless interposing drop. If capturing the piece results in mate, the drop was 
useless according to (i) and all other pieces that can be dropped on S are useless 
interposing drops and search can be stopped. If taking the dropped piece does 
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not give a mating position, but a move sequence is found leading to mate after 
the capture without using the captured piece, then the dropped piece on S was 
a useless interposing drop according to (ii) and search can be stopped. 



4.2 Results 

T1 was implemented on a 16 MHz PC98-RA21 (a Japanese PC) and on a SUN 
IPC workstation. Here we will summarize the results of three tests performed 
with the program m ■ One was a test set of 50 tsume-shogi problems made by 
the professional shogi player Nakata Shodo. The length of the solution sequence 
was 7 to 15 plies. The performance of T1 was compared to that of human experts. 
T1 found the solutions of the entire problem set in about 27 minutes, while the 
best human expert could only solve 46 problems within a one hour time limit. 

There are some unanswered questions with this test. One is that the original 
test set had 54 problems. It is unclear why only 50 of those problems were 
selected. Also, it is unclear how many human subjects there were and what the 
exact test conditions were. 

The second test given to T1 was a set of 32 difficult tsume-shogi problems 
from a shogi magazine. The solution length of the problems was 7 to 9 plies. T1 
found the solution of all the problems with an average solution time per problem 
of 55 seconds. According to the scoring table attached to the tsume problems, 
this is the highest possible level: “professional strength” . Of course, the solutions 
of these problems are not long enough to really test the program. After all, the 
old program by Ochi mentioned before already claimed the same performance. 

The final test for T1 were eight 15 ply tsume-shogi problems and eight 17 ply 
tsume-shogi problems. The 15 ply tsume problems were solved in 130 seconds 
on average, while the 17 ply tsume problems were solved in about 10 minutes 
on average. 

Noshita’s own conclusion is that T1 works well for tsume-shogi problems 
with a solution sequence up to 17 plies and is able to find the solution of some 
problems with solutions that are longer than 30 plies. However, for most of the 
problems with a solution longer than 17 plies, T1 is too slow to find the solution 
within a reasonable time limit. 



5 T2: T1 Plus Hash Tables 

5.1 Method 

T2 is also made by Noshita and an improved version of Tl, using similar ideas, 
which are described in |2J E9- One improvement in T2 over Tl is that the 
heuristics for king safety are better. However, the big difference between the two 
programs is the introduction of hash tables for transposition, domination and 
simulation. The search for useless interposing pieces is now done differently. If 
a possible useless interposition is detected, the piece is taken and given back to 
the defending side. If there is a mate in the resulting position, then the piece 
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was indeed a useless interposition and all moves with other pieces dropped in 
defense on the same square need not be searched. Here hashing is very useful 
if the position with the capturing piece closer to the king is searched first and 
stored in the hash table. For tsume shogi the introduction of hash tables can 
make a big difference. Noshita gave the position of Diagram 1 before the check 
with the rook on 9a (black rook on 9i instead of 9a) to both T1 and T2. T1 had 
to search 1,500,000 positions to search through all the interposing drops and find 
a mate, while T2 only needed to search 25,000 positions. 

5.2 Results 

T2 runs on a SUN SPARC IPX with 28 MB of memory. For T2 a different test 
set of 3 to 9 ply tsume problems was used. There were 112 problems in the test 
set. T2 found the solutions to the whole problem set in 1 minute. According 
to the author of these tsume-shogi problems, solving all problems in less than 
70 minutes could be considered to be a “professional” performance. Again, the 
solution lengths of these test problems are too short to really test the program. 

T2 has also been given 25 problems with solutions of 19 to 25 plies. T2 could 
solve all these problems, but needs more than 15 minutes per problem for the 
23 and 25 ply tsume problems. 

Finally, T2 has been given the problems in the classic book Zoku-tsumuya- 
tsumazaruya (“Mate or no mate?”) fC2J- In this collection the tsume problem 
with the shortest solution has a solution sequence of 11 plies and the longest 
problem has a solution of 873 plies. Even though the collection officially has 200 
tsume problems, the actual number of problems is 195. A few problems are not 
tsume problems and a few problems have no solution because a defense against 
which no mate can be found was overlooked by the composer. 

T2 found the solution of 70 of these hard problems. It is unclear what the 
maximum time per problem was, but the graphs in 0 seem to suggest more 
than two and a half hours. 

6 Ito’s Tsume Program: Best-First Search 

6.1 Method 

Ito’s program was the first to use best-first search ||][B|. It assigns the following 
numbers to the nodes of the search tree: 

— Mate: 0 

— No possible checks: oo 

— Leaf node: KingFreedom 

— AND-node: J2 KingFreedom chi i dren ( n ) 

— OR-node: min KingFreedorn c y li i dren ( n ) 

The next node to expand is the node where the freedom of the king is minimal. If 
multiple nodes have the same minimal value, one node is chosen randomly. There 
are no depth limits to the search. Transposition, domination and simulation is 
dealt with in the same way as in T1 and T2. 
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6.2 Results 

Ito’s program runs on a Sparc Station 10. It was given the same tests as T2. 
It took 2.5 minutes to find the solutions of the 112 easy problems. Of the 25 
problems of 19 to 25 plies solutions depth, Ito’s program could solve 19. It was 
much quicker than T2 in solving the 23 and 25 ply tsume problems, using only 
100 seconds on average. 

Ito’s program was much better than T2 in solving the hard problems of 
Zoku-tsumuya-tsumazaruya. It could solve 120 problems. 

The collection Zoku-tsumuya-tsumazaruya has a major drawback: no less 
than 25 problems have been shown to be incorrect, namely having more than 
one solution. In all but one case there was a shorter mate than the one intended 
by the composer of the problem. In one case there was a defense which had a 
solution that was 9 plies longer than the intended solution. The collection is 
therefore easier than the number of moves of the intended solutions suggest. 

A better test set is Tsume Zuko, the masterpieces of the tsume composer 
Ito Kanju (1719-1761). These 100 tsume problems were published in 1755 and 
are still considered to be among the best problems ever made. In the set of 100 
problems with a solution length of 9 plies to 611 plies (average length 42 plies) 
there are only two problems for which there is a shorter solution than the one 
intended. For these two problems there are repaired versions available with only 
small changes in the position and the solution as intended by Ito Kanju. Ito’s 
program could find the solutions of 63 of these 100 problems j2Bl ■ 

Ito’s program could not find the solutions of short tsume problems as well as 
T2, but it was a major step forward in deep search. However, like most best-first 
approaches, it suffered from memory problems when searching very deep, since 
too big a portion of the search tree had to be kept in memory. For example, 
Ito’s program was not able to find the solution of the 611 ply Kotobuk fl problem 
with its normal memory management. Only when the memory management was 
replaced by a scheme that freed large portions of the search tree that were 
unlikely to lead to mate, Ito’s program could solve Kotobuki. However, it took 
70 hours to find the solution, indicating that too often parts of the search tree 
had to be regenerated. 

7 T3: AO* for Tsume Shogi 

7.1 Method 

T3 is the third tsume-shogi program by Noshita. The methods used in the pro- 
gram are described in JUSjj. T3 is based on AO* 23] and therefore very different 
from T1 and T2. There is a priority queue of leaf nodes which are stored with 

4 Kotobuki means “Long life” and is the final problem in the Tsume Zuko problem 
set. 

5 I have not found a separate publication on T3. Details about the program have been 
taken from Japanese papers describing different programs and co-authored by several 
tsume-shogi programmers. 
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their expectancy of mate. This value is again based on the freedom of the king, 
but in T3 also the mating expectancy of several ancestor nodes of the leaf nodes 
is used in the mate expectancy value of the leaf nodes m ■ The node that is at 
the top of the queue is the node to be expanded next. 

The second part of the T3 algorithm is the serialization of AND-nodes. Since 
all AND-nodes need to be TRUE, only one of the children needs to be expanded. 
All other nodes are only expanded if the active node returns a TRUE value. OR- 
nodes are expanded in the normal way. An enhancement of this scheme used in 
T3 is to use a limited alpha-beta search to look for a quick mate after an AND- 
node is being expanded. This keeps the mating search in a local promising area 
as long as possible. 

The third important concept in T3 is a different way of dealing with trans- 
position, domination and simulation. T3 has a set of linked lists that connect 
nodes for which the mating value is related. Only the node on which all other 
nodes in the linked list depend is considered for expansion. All other nodes are 
frozen until this one node returns either a mating or a non-mating value. Then 
the other nodes in the list are revived and the mating sequence of the solved 
node is tried in each of the other nodes in the linked list. As a result, the search 
tree is no longer a proper tree, but an AND/OR graph. 

The data structure (numerous sets of linked lists) necessary for this best- 
first search scheme is large. T3 keeps as much of it as possible in memory. 
However, discarding part of the search tree and the connected linked lists cannot 
be avoided when searching for tsume-shogi problems with very long solutions. 
Regenerating too many nodes will seriously slow down the program, so the speed 
of the program depends very much on the quality of the garbage collection. Still, 
connecting nodes in this way uses less memory than normal hash tables. This is 
because in shogi the concepts of domination and simulation make it necessary 
to store extra information in the hash table about pieces in hand. 

7.2 Results 

T3 ran on the same machine as T2, a SUN SPARC IPX with 28 MB of memory. 
The only published test results of T3 are on the Tsume Zuko problems. T3 could 
find the solution of 68 of these. Kotobuki is solved in 65 hours. 

8 Kawano’s Tsume Program: Priority and Simulation 

8.1 Method 

Kawano’s tsume program is based on best-first search. Each node is given a 
priority. At the leaf nodes, the priority is the same as the move ordering according 
to the criteria given for Noshita’s programs. However, this move ordering is only 
done for moves by the defending side. This local ordering is generalized into a 
global ordering of the tree by giving the attacker’s move the priority of its parent 
node and making the move with the highest local priority equal to the priority 
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of the parent node. This gives an ordering of the nodes in the game tree and the 
leaf node to be expanded next. 

Kawano deals with the problem of possible simulation by defining a choice 
function for the moves of the attacking side. This choice function defines when 
two moves are the same even though they might be textually different. Simulation 
is now defined as follows EJ : 



Definition 4 Position P simulates position Q if the move sequences defined by 
the choice function that can be played from P are the same as the move sequences 
defined by the choice function that can be played from Q. 



The choice function is only defined for the moves of the attacking side, so the 
definition of this function does not affect the outcome of the search. If two moves 
are defined by the choice function to be the same, but are actually different (i.e. 
the mating sequence of P does not work in Q), the different defense moves at 
the next move will show that the two positions are not simulating each other. 
The choice function is therefore a shogi heuristic used to guide the search in 
a more promising direction. The choice function defines moves with promoted 
pieces the same as moves with unpromoted pieces and moves with pieces from 
different starting squares as the same (see [13 for more details). 



8.2 Results 



Kawano’s tsume program runs on a Dec Alpha 400 MHz with 1GB of mem- 
ory. Like for T3, there are only reported results for the Tsume Zuko problems. 
Kawano’s program can find the solution of 88 of these problems The max- 
imum time per problem was 2 hours. In Table Hit can be seen that 81 of these 
problems are solved within a 100 seconds. The solution of the Kotobuki problem 
is found in 6 minutes and 46 seconds. 



Table 1. Results of Kawano’s program on Tsume Zuko 



CPU time (s) 


Solved 


0-1 


39 


1-10 


30 


10-100 


12 


100-1000 


6 


1000-7200 


1 


Total 


88 
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9 Seo’s Tsume Program: Conspiracy Numbers 

9.1 Method 

Seo’s tsume program jjZEfl 29] uses an algorithms called C*. This name is not 
explained but probably chosen because the algorithm is based on AO* but uses 
conspiracy numbers to guide the search. In conspiracy-number search a heuristic 
value is assigned to each node in the search tree. The conspiracy number of a 
node N is the minimal number of leaf nodes in the subtree of N that need to 
change their value in order to change the minimax value of N. Nodes in the tree 
are then expanded in such a way that they will narrow the range of possible root 
values m ■ For the AND/OR trees of tsume shogi, there are only three values 
for every node: TRUE (mate), FALSE (no mate) and UNKNOWN. Therefore, 
in tsume shogi we have the following rules of assignment of conspiracy numbers 
(CN) to a node n: 

— Mate node: CN(n) = 0 

— Node with no possible checks: CN(n) = oo 

— Leaf node: CN(n) — 1 

— AND-node: CN(n) = ]T C N children(n) 

— OR-node: CN{n) = min CN chi i dren ^ n ) 



To solve the problem of memory, Seo uses a depth-first iterative deepening 
approach. First, find a solution for a conspiracy number threshold of 1 for every 
node. Then, if no solution is found, set the threshold to 2 and so on until a 
solution is found for conspiracy number threshold n. In general, this iterative 
approach would result in too many regenerated nodes. To keep this to a mini- 
mum, Seo uses big hash tables to store as many positions as possible with their 
conspiracy number. If a position is searched again at iteration p , the conspiracy 
number of the node is initialized to the conspiracy number in the hash table, 
which is a value smaller or equal to p — 1 (Seo calls this dynamic evaluation) . 
Also, to avoid regenerating nodes as much as possible, Seo’s tsume program gives 
priority to the nodes that have not been expanded at previous iterations. Seo 
has been able to keep the average number of regenerated nodes at about 20%. 

Seo also uses move ordering and looks for transposition, domination and 
simulation. Interposing drops closer to the king higher are ordered higher than 
interposing drops further from the king like in most of the other programs. 
Simulation is used by Seo to try the same mating sequences in positions that 
only differ in the interposed pieces and in cases where promotion is optional. He 
calls this a killer heuristic, similar to the concept used in chess programs. 

9.2 Results 

The first version of Seo’s program ran on a Sun Sparc Station 20 workstation. 
Seo’s results are very impressive. His program can find the solution of 190 of 
195 problems in the Zoku-tsumuya-tsumazaruya test set and 99 out of the 100 
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problems in Tsume Zuko, also using a maximum of two hours per problem. 
Detailed results on solution speed can be found in Table |2j The Kotobuki problem 
was solved in 1 hour and 12 minutes. Seo’s program is slower than Kawano’s 
program, but it is more powerful in that it can find the solutions of more tsume- 
shogi problems with long solutions. 

The only problem in the Tsume Zuko test set that Seo’s program could 
not find a solution for was a very complicated 41 ply tsume problem with an 
unusually high number of long side variations leading to mate. As a result, the 
solution subtree has a very high conspiracy number throughout the search so 
there are only few node expansions in that vital part of the tree. 



Table 2. Results of Seo’s program 



CPU time (s) 


ZokuTT 


Tsume Zuko 


0-1 


21 


2 


1-10 


44 


23 


10-100 


68 


39 


100-1000 


39 


29 


1000-7200 


18 


6 


Total 


190 


99 
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Diagram 2. Microcosmos 



The holy grail for tsume-shogi programs is the Microcosmos problem (Dia- 
gram 2). This problem was composed by Hashimoto in 1986 and has a solution 
length of 1525 plies. Since the publication of his master’s thesis in 1995, Seo 
has improved his program and ported it to a 166 MHz Pentium with 256 MB 
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memory. This new version was able to solve Microcosmos in about 30 hours in 
April 1997. 



10 Conclusions and Further Research 

In this paper we have discussed the following features of a good tsume-shogi 
program: 

— an algorithm that can search deeply to variable depths; 

— hash tables to not only deal with transposition, but also with domination 
and simulation; 

— move ordering based on freedom of the king. 



Table 3. Results of all programs on the two hard test sets 



Program 


Author 


ZokuTT 


Tsume Zuko 


Year 


T1 


Noshita 


- 


- 


1991 


T2 


Noshita 


70 


- 


1992 


Ito 


Ito 


135 


63 


1992 


T3 


Noshita 


- 


68 


1992 


Kawano 


Kawano 


- 


88 


1994 


Seo 


Seo 


190 


99 


1995 



In Table El the results on the two major test sets for the programs discussed 
in this paper are summarized. It is not easy to compare the performance of 
the programs, since they are running on very different platforms. However, two 
things are clear from this table and from the data summarized in the previous 
sections. One is that most of the current tsume-shogi programs perform better 
than human experts. The two tsume-shogi collections are considered to be very 
hard and the general opinion among shogi players is that no human player will 
be able to solve more than 80% of these problems. The recognition of the perfor- 
mance of the tsume-shogi programs is further supported by the fact that shogi 
magazines these days use tsume-shogi programs to aid in the analysis of difficult 
endgames played by top professional players. 

The second conclusion is that Seo’s tsume program is clearly the best of 
the tsume programs discussed in this paper. Since different hardware is used, 
comparing Ito, T3 and Kawano’s tsume program is almost impossible. Especially 
Kawano’s tsume program is running on one of the fastest and biggest machines 
currently available. It would be interesting to test T3 again on such a machine. 
However, even with Kawano’s extra computing power, Seo’s program still finds 
more solutions of long tsume-shogi problems. Seo’s program is able to find the 
solution of almost any tsume problem and it will be hard to make a program 
to improve it. Still, even though Microcosmos has been solved, there remain a 
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number of other long tsume-shogi problems for which Seo’s program cannot find 
the solution in a limited time. 

One of the possible improvements of Seo’s program might be to use proof- 
number search instead of conspiracy-number search. Proof-number search is 
an improved version of conspiracy-number search designed especially to solve 
AND/OR trees |BJ. We have started to develop a tsume-shogi program for our 
shogi-playing system SPEAR based on proof-number search. Although work on 
this tsume-shogi program has not been finished yet, preliminary results show that 
for a significant number of test problems, smaller search trees are built than in 
Seo’s program. Improving the tsume-shogi program in SPEAR is a future work. 

It is interesting that complexity of a tsume-shogi problem cannot be defined 
by the length of the solution sequence. For the 17 problems in the range from 
9 plies to 19 plies, Seo’s program on average needed 150 seconds per problem, 
while for the 18 problems from 31 plies to 39 plies, Seo’s program took only 82 
seconds per problem. Furthermore, there was a relatively short 23 ply tsume- 
shogi problem in the Tsume Zuko test set that could not be solved by Kawano 
and for which Seo took almost 1.5 hours. Variable-depth search is clearly having 
problems with other features of a tsume problem. For human players also, the 
difficulty of a tsume problem is not necessarily related to the length of the 
solution. It would be interesting to see if there is a correlation between the 
difficulty of tsume-shogi problems for variable-depth search and human experts. 

Tsume shogi no longer seems to be a hard problem. In this paper we have 
discussed methods for building a strong tsume program. However, there is still 
a good number of problems for which the strongest programs cannot find a 
solution. Furthermore, the time limit of two hours for the problems in the test 
set is long, even though it has become the standard time limit for most tests with 
the programs discussed. Algorithmic improvements to get the same results with 
a stricter time limit are another challenge left for tsume-shogi programs. Finally, 
there is one problem remaining with most variable-depth tsume-shogi programs. 
It is possible that a promising check at node N is searched deeply and that a 
mate is found in p moves (p > 1). This will end the search at this OR-node, 
even if there is a mate in one at a sibling node of node N. Most programs have 
some heuristics to avoid this problem as much as possible, but except for the 
alpha-beta search in T2, all programs discussed in this paper from time to time 
give solution sequences that are too long. The simple solution to this problem 
would be to regenerate the search tree after a mate is found and search all child 
nodes of OR-nodes that were not expanded for shorter mating sequences. It is 
unclear how much search overhead this will cause. 

The important question is of course how these successful results can be used 
outside the domain of tsume shogi. After all, tsume shogi is only important in 
the final stages of a shogi game. Can the techniques discussed in this paper help 
in building a strong shogi-playing program? For the time being, this remains 
an open question. As said, the branching factor of normal shogi (80) is much 
larger than the branching factor of tsume shogi (5). The size of the search tree 
for a 40 ply tsume-shogi problem is therefore about the same as a 15 ply search 
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in a normal shogi position. Such a deep search is not out of reach for tsume- 
shogi programs, so the methods discussed in this paper might be interesting 
for a normal slrogi-playing program as well. Conspiracy-number search has been 
shown to be applicable to normal minimax game trees. One big problem is 
of course to set the correct search target which is trivial in tsume shogi. This 
problem is illustrated by Seo, who has made a shogi program based on his work in 
tsume shogi. His program thus far cannot compete with the strongest programs, 
despite its obvious strength in tsume shogi. We believe that variable-depth search 
algorithms are worth further investigating for shogi and we intend to develop a 
shogi program based on these ideas in the future. 
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Abstract. This paper explores evolutionary changes of Shogi (Japanese 
chess) using game-theoretic analyses by computer. Heian Shogi is an 
ancient game only briefly described in the literature. Therefore, it is 
impossible to know exactly how it was played. Through game-theoretic 
analyses of rules, we estimate the historical changes of this ancient game. 
Our method provides a new innovative approach to guess logically how 
these ancient games actually have been played. This paper focuses upon 
the game results of the KGK endgame on NxN boards, applying game- 
programming methods. Then it determines the size of the boards in which 
the side of King and Gold always wins except trivially drawn cases with 
the Gold being captured. Based on the analyses, we discuss the rules of 
Heian Shogi. We specifically provide a logical interpretation of the shift 
from the 8x8 board to the 9x9 board in the evolutionary history of Shogi. 
Keywords: evolution of games, retrograde analysis, KGK endgame, 
Shogi, Heian Shogi 



1 Introduction 

Shogi (Japanese chess) is especial among chess games in the world, because it is 
the only chess variant which has a rule to reuse captured pieces. This reuse rule 
of Shogi makes the game extremely complex. (For an introduction to Shogi, see 
[Q). Because of such uniqueness in Shogi, it is very interesting to know how Shogi 
has been invented or evolved from ancient types of chess without rule of reusing 
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the pieces. To know the history of the present-day version of Shogi (further called 
modern Shogi), it is very important to know what kind of Shogi variations have 
been played in the old days. 

According to the famous old literature of Shogi called Nichureki (see Ap- 
pendix A), Heian Shogi, an old style Shogi, has already been played in the 
Heian Era (794-1191). Heian Shogi is considered as an archive of modern Shogi, 
because of the basic similarity of the game structure. However, it is distinctively 
different from modern Shogi in a few aspects. 

Interestingly, at least two types of Heian Shogi are known m- One type, 
shown in FigJD as Type (1), is very similar to modern Shogi, except it lacks a 
Rook and a Bishop. It also uses a 9x8 board, instead of the 9x9 board used by 
modern Shogi. The other type, shown in FigElas Type (2), has only one Gold 
and uses an 8x8 board. In those days, Heian Shogi has no rule of reusing captured 
pieces, unlike modern Shogi shown in Fig0 
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Fig. 1. The initial position of Heian Shogi, Type (1) 

It is played without Rooks and Bishops on a 9x8 board and does not reuse 
captured pieces. 



1.1 Rules of Heian Shogi 

There is no good reference that describes the rules of Heian Shogi. Nichureki 
(see Appendix A) is the only one that briefly describes its rule how a game is 
ended (hereafter called the Nichureki Rule) as follows: “The player wins if he 
takes all the pieces of the opponent, except the King.” 

From the Nichureki Rule, we consider four rules of Heian Shogi, as described 
below (Ri to R 4 ). 

From these rules, we evaluate how the outcomes (win, loss or draw) depend 
on the size of the board. 
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Fig. 2. The initial position of Heian Shogi, Type (2) 

It is played without Rooks and Bishops, and has only one Gold for each side 
on an 8x8 board, and does not reuse captured pieces. 
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Fig. 3. The initial position of modern Shogi 

It has Rooks and Bishops, and two Golds for each side on a 9x9 board, and 
can reuse captured pieces. 



Ri : We assume that the rules about the movement and promotion of pieces 
and forbidden moves are the same as in modern Shogi. However, captured 
pieces are not used again. 

R .2 : If the player has no legal moves, he loses the game. 

R .3 : The player wins if he takes all the opponent’s pieces except the King. 

R 4 : The game is drawn if both players have only a King (we define this condition 
as a trivial draw, to contrast it with other draws). The repetition of an 
identical position is also a draw. 
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1.2 The Shift from 8x8 Boards to 9x9 Boards 

When two experienced players (e.g., Shogi grandmasters) play a game of Heian 
Shogi, they often reach a “King and Gold vs King” endgame, which we denote 

as a KGK endgame. 

Furthermore, they may also sometimes reach a “King and Pawn vs King” 
endgame which will lead to a KGK endgame by the promotion of the Pawn, 
except the evident cases of the Pawn being captured. Therefore, the results of 
games (“win, lose, and draw”) in Heian Shogi should be highly related to the 
results of the KGK endgame. 

In this paper, we analyze the results of KGK endgames on NxN boards, 
using the retrograde-analysis method ||. We determine the size of the boards 
in which the side of King and Gold always wins except evident draw cases of 
the Gold being captured. Based on the analyses, we discuss the rules of Heian 
Shogi. We specifically provide a logical interpretation of the shift from the 8x8 
boards to the 9x9 boards in the evolutionary history of Shogi. 

2 Retrograde Analyses and the Classification of KGK 
Endgames 

In this section, we classify all the positions of the KGK endgame by the retro- 
grade-analysis method 0. For clarity, the two players are distinguished as the 
attacker for the “King and Gold” side and the defender for the “King alone” 
side. 

We first describe all the positions-in-mate, the final winning conditions for 
the attacker. Then we expand the positions-to-mate backward by the retrograde- 
analysis method, and count all the attacker’s winning positions. In these analyses 
we also take into account the size of the board, and we determine on what size 
the attacker always wins, except the evident drawn cases in which the Gold is 
captured. 

2.1 Retrograde Analyses of the Attacker’s Winning Positions 

Let X be the set of all legal positions of the KGK endgame on an NxN board. 
Within X, let Wq be the set of all the attacker’s won positions, i.e., positions-in- 
mate. Note that, according to rule R 2 , any position where the defender is unable 
to move is a position in Wo- 

Next, let Wi be the set of all positions from which at least one move by the 
attacker will be able to lead to some position in Wo- In other words, any position 
in W\ is a position where the attacker is to move and has at least one legal move 
that leads to a position in Wq. 

Then, let W 2 be the set of all positions from which every defender’s move 
leads to some position in W±. 

In this manner, we expand the attacker’s winning positions by retrograde 
analysis jOj , from positions-in-mate to positions-to-mate-in-n, with n some pos- 
itive integer. 
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In this procedure, to avoid repetitions, we construct the sets W, : such that 
they are disjoint, i.e. , such that for any i.j (0 < * < j) 

Wi n Wj = 0. 

Because the set of all positions of the KGK endgame X is finite, retrograde 
analysis always converges to some integer m, such that 

W m ± 0 and W m+ 1 = 0. (1) 

From any position in W m the attacker will be able to win in at most m steps, 
irrespective of the defender’s responses. Because m is the length of the longest 
path for the attacker’s definite win, we call such a position a longest position- 
to-mate for a given NxN board. For two board sizes ( N = 9 and 10), examples 
are shown in Appendix B. 

Now we can determine the set of all possible winning positions, denoted as 
U™ Q Wi in which the attacker can always win, irrespective of the defender’s 
responses. 

2.2 Retrograde Analyses of Trivial Draws 

It is trivial that a draw happens when the defender captures the attacker’s Gold 
(called a trivial draw). Let Dq be the set of all positions of trivial draws, i.e., 
all positions in which the lone King can immediately capture the Gold. 

We now expand Di from Dq in the same manner as we did for Wi. Let D\ be 
the set of all positions in which any move by the attacker leads to some position 
in D 0 . 

Then, let Z ?2 be the set of all positions in which the defender has at least one 
move that leads to a trivial drawn position in D±, etcetera. 

In these retrograde analyses of Di, we again require that for i,j (0 < i < j) 



Di n D 0 = 0. 

As in Wi, because the set of all possible positions X is finite, there exists a 
finite integer n such that 



^ 0 and £> n +i = 0. (2) 

Now we can determine the set of all positions, denoted as U j=o w hich 

the defender can always achieve a trivial draw, irrespective of the attacker’s 
moves. 

2.3 Analyses of Non-trivial Draws 

Previous analyses consider the positions that lead to either the attacker’s win, or 
a trivial draw in which the defender captures the Gold. We know by experience 
that there are some positions which do not belong to either case. In the following, 
we consider such inconclusive positions. 
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Some positions may belong to neither (J”L 0 W* nor U j=o Dj ■ Let P be one 
of such inconclusive positions where the attacker is to move. In P, the attacker 
has no move that leads to |J™ 0 W t . Meantime, P ^ (Jy=o Dj- Therefore, there 
always exists a move P — » Pi such that Pi ^ (J " =0 Dj , and the attacker should 
choose such a move. 

Thus we get the position P\ which satisfies: 

m n 

P^UWi U U Dj. (3) 

i=0 j= 0 

Similarly, in P\ where the defender is to move, he has no move that leads to 
any position of Uy=o Dj • Therefore, there always exists a move P\ —* P 2 such 
that P 2 ^ Ul=o W'i : an d the defender should choose such a move. Thus we get 
the position P 2 that satisfies: 

771 71 

P 2 i[jW i U U Dj. (4) 

i = 0 j =0 

By repeating this procedure, we obtain a (potentially infinite) sequence of 
positions: 

P_ + P 1 _P 2 >Pl , (5) 

It is easy to show that this sequence always contains a loop for any P, as 
explained below. Note that the following condition holds for any such P&: 

771 71 

Pki\JWi U [J Dj. (6) 

i —0 j= 0 

Since any position P^ from such a sequence is a KGK position, it must be 
contained in the finite set X. Therefore, some position P; must be identical to 
some position Pj with j < l, leading to a loop. 

Thus any position P^ that satisfies the relation © has at least one move 
that leads to a loop. For the attacker, such a move is to avoid a trivial draw 
and for the defender, to avoid the attacker’s win. Therefore, the game is a draw, 
according to rule R 4 , because it returns to the same position repeatedly. 

2.4 The Definition of Deterministic Wins 

Now we consider the relationship between board size and the attacker’s winning 
positions. First we evaluate the size of the boards in which the attacker is always 
able to win a KGK endgame. We introduce the concepts of deterministic win 
and complete deterministic win. 

A KGK endgame on a given NxN board is called a deterministic win if 
the set of all KGK endgame positions (say X) satisfies: 

771 71 

X = \J Wi u (J Dj- (7) 

1=0 j— 0 

Relation (0 means that the set of non-trivial draws (loop cases) is empty. 



324 Hiroyuki Iida et al. 



A KGK endgame on the 3x3 board is called a complete deterministic 
win, since all draw sets Dj are empty. 

For every deterministic won KGK endgame the sets in the right-hand side of 
equation 0 are disjoint. Therefore, the number of all the elements |X| of the 
whole set X satisfies: 

m n 

|X| = £|Wil + J2\Dj\ (8) 

2=0 j = 0 

We show a relation between board size and the length of the mating sequence 
of a longest position-to-mate in Theorem [I] 

Theorem 1. 

Let P and m(N) be a longest position-to-mate and the number of its steps for 
a given NxN board , respectively. If two board sizes (say N\ and N 2 ) satisfy 
condition Q>, then 

Ni < N 2 => m(Ni) < m(N 2 ). (9) 



Proof 

Suppose position P on an NxN board. A mating sequence for P always will 
lead to a mate, in which the defending king is mated on an edge or in a corner 
(no mate at the middle of the board is possible). Constitute the N + 1 size 
board position from P, by adding one row and one file to the NxN board at 
the edge or corner where the mating process will take place (if mated at the 
upper or lower edge, the column may be added arbitrarily at the left or right 
side; similarly for a mate at the left or right edge). Then, from the proposed 
longest mating sequence on the NxN board the defending King may escape to 
the additional space at the (N + 1 )x(N + 1) board, giving rise to at least some 
additional moves to be mated. Since no trivial draw is involved, the only other 
possibility would be that the defending side can enter a non-trivial draw loop, in 
contradiction with equation 0 - As a consequence, the longest mating sequence 
on an (N + l)x(N + 1) board without non-trivial draws is strict longer than such 
a sequence on an NxN board. EH 

Let A be the set of non-trivial draws. The following relation obviously is 
derived from Theorem n 

m(N) > m(N + 1) => A ^ 0 (10) 

We can determine the maximum size of the board for which KGK is a deter- 
ministic win, by looking for the first occurrence of the following condition: 

m(N) > m{N + 1) (11) 

during the counting process of the attacker’s winning positions in the retrograde 
analyses. 
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2.5 Counting All Positions 

We have writen a computer program that optimally plays KGK endgames on 
an NxN board. This program examines all possible moves for each player and 
determines the best move in each step. We have analysed the KGK endgame on 
NxN boards for N = 3, ... , 15, by counting all positions in Wi and Dj defined 
in Sections 2.1 and 2.2, respectively. 

Table 1 shows the result of counting all attacker’s winning positions, trivial- 
draw positions and non-trivial-draw positions. 



Table 1. The number of all the attacker’s winning positions and draw positions 
on an NxN board. 
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N represents the board size. |Xj represents the size of the set X of all legal 
positions of the KGK endgame on an NxN board. Similarly, |W|, jD| and |D'| 
represents the number of all winning positions, the number of all trivial-draw 
positions, and the number of all non-trivial-draw positions, respectively. m(N) 
represents the number of steps for each longest position-to-mate. 



From Table 1 we see that the KGK endgame is a deterministic win on boards 
of sizes up to N = 10, and that the endgame includes non-trivial draws for board 
sizes with N > 11. 

The data are graphically depicted as percentages in FigEJ We clearly see 
that from N = 11 upwards the percentage of won positions drastically decreases 
from aproximately 80% to almost zero, whereas non-trivial draws behave exactly 
opposite. Trivial draws always constitute some 20%, irrespective of board size. 

3 Discussion 

From these results of the KGK endgames, we can infer some results on the 
evolutionary history of Heian Shogi. 



326 Hiroyuki Iida et al. 




board size N 



Fig. 4. The ratio of three different types of positions as a function of board size. 



3.1 Symmetry of the Initial Position 

In modern Shogi played on the 9*9 board the initial position is left-right symmet- 
ric around the center file (the file of Kings), except for the Rooks and Bishops 
that are symmetric to each other. However, at least two different sizes of the 
board are known in Heian Shogi: the 8x8 and 8x9 boards (Fig ID and FigJ2J. 

The initial position cannot be left-right symmetric on the 8x8 board because 
of the even number of files. However, from the viewpoints of the conceptual 
beauty, perfection and excellence of games, left-right symmetry seems very im- 
portant in chess-like games. The initial position in the 8x9 board is left-right 
symmetric around the King’s file. It is therefore natural to imagine that the 8x8 
board is a primitive type of Heian Shogi and the 8x9 board an advanced type, 
being a transition to the 9x9 board of modern Shogi. 

Another interesting point in the initial positions of Shogi games is the symme- 
try between the lower side and the upper side. There are two types of symmetry: 
point symmetry to the center of the board and line symmetry between the two 
sides. The initial position of the 8x8 board is point symmetric, but not line sym- 
metric, because of the placement of Kings and Golds (see Fig]2jl. In contrast, the 
8x9 board is both point symmetric and line symmetric (see Fig[I]). Modern Shogi 
is both point symmetric and line symmetry except for the Rooks and Bishops 
that are not used in Heian Shogi. 

Suppose that we add one row between the two sides to the 8x8 and 8x 
9 boards, spacing three squares between the opposing Pawns, instead of two 
squares. Because Heian Shogi does not reuse the pieces, such games would follow 
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mimic play (the second player can never lose if he moves exactly line-symmetric 
to the first player, since the first player will never put a piece beyond the fourth 
row, because it simply would be captured by the second player). Such games 
therefore necessarily would lead to draws 0 . 

Furthermore, Heian Shogi has no Rooks and Bishops. Therefore, it is logical 
for Heian Shogi to have only eight rows. From the technical points of view, it is 
natural and logical to suppose that the addition of one row to make nine rows 
is accompanied with the introduction of both Rooks and Bishops during the 
transition from Heian Shogi (eight rows) to modern Shogi (nine rows). 

3.2 Adding Rooks and Bishops to Heian Shogi 

Suppose that Heian Shogi would be played with Rooks and Bishops, but without 
the reuse rule for captured pieces. Such a game is highly likely to end up with 
the KGK endgame. Suppose that a player has at least a piece more than the 
other player. Then, as a strategy, he should try to exchange pieces, aiming at 
the KGK endgame (or better). This exchange rule often is a good strategy when 
ahead in many other chess-like games also. 

As a consequence of this fact it seems logical that the introduction of Rooks 
and Bishops has been accompanied by the immediate or soon introduction of 
the reuse rule for captured pieces. 

3.3 Harmony between the Perfection of Beauty and the Evolution 
of Complexity 

In the KGK endgame of Heian Shogi, the maximum board size for the attacker’s 
deterministic wins (the attacker wins except trivial draws by capturing the Gold) 
is the 10x10 board. However, if the board size is even, the left-right symmetry 
of the pieces in the initial position is broken (see FigJUand FigQ). To maintain 
the left-right symmetry the board size should be odd. Therefore, the maximum 
board size for the attacker’s deterministic wins with left-right symmetry is 9x9. 

From a mathematical as well as a historical viewpoint, this explanation seems 
to be highly in line with the use of the 9x9 board in modern Shogi. The history of 
games seems to be a development of the harmonic balance between the perfection 
of beauty and the evolution of complexity. Here, beauty implies some form of 
simplicity and often opposes to complexity. Modern Shogi is a result of the nearly 
1000-year perfection from Heian Shogi. We could suspect that, during such a 
long history of perfection, in order to fulfil the desire for strategic complications, 
Shogi became being played at the 9x9 board. 

In this, the desire for strategic complications can be satisfied by the max- 
imization of the board size, under the constraint that the attacker still has a 
deterministic win in the KGK endgame. If the condition of a deterministic win 
is not kept and the size of board is maximized beyond, Shogi becomes indeter- 
ministic and chaotic. Thus we could say that, from the viewpoint of the harmony 
between perfection and complications t2J , 9x9 Shogi has become a highly matured 
game compared with other chess-like games with different board sizes. 
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3.4 Heian Dai-Shogi: A Dead-End of the Early Evolution 

In order to deal with the necessity of strategic complications, it may be an 
obvious idea to enlarge the board size while adding other kinds of pieces and/or 
increasing the number of pieces, by which the average number of legal moves 
easily grows. In fact, Heian Dai-Shogi, described in Nichureki (Appendix A), is 
a large-size variation of Heian Shogi, being played at a 13x13 board. We judge it 
to have been a dead-end of the early evolution. This is because Heian Dai-Shogi 
had little advantage concerning strategic complications by increasing the number 
of legal moves, while showing a big disadvantage with respect to the beauty of 
the rules by requiring a coarse-grained rule to judge the ending like rule R 3 . 

3.5 The Rules of Heian Shogi 

In the beginning we proposed as rules of Heian Shogi Ri, R2, R3 and R4. The 
rules of Heian Shogi should have evolved gradually over many centuries and 
should have been polished to become a set of necessary and sufficient rules. 

From this historical view, we consider the relationships among the proposed 
rules of Heian Shogi: 

— R3, a rule of Nichureki, contradicts with R4 in the following two cases. First, 
the repetition of an identical position in KGK is considered a non-trivial 
draw according to R4. However, R3 treats this position as an attacker’s win. 
Second, trivial draws belonging to (Jy=i Dj according to R4 are treated as 
attacker’s wins, following R3. 

— R 3 includes all the positions of R 2 , since (unlike western chess) any piece in 
Shogi other than the King always can move when on the board. Thus if one 
side has no legal moves, he must have a King only. All the positions defined 
as attacker’s win by R2 are already defined according to R3. Thus R2 is 
superfluous when R 3 is included in the set of rules. 

From these considerations we expect that the following two combinations of 
rules are logically consistent: 

1. Rule Set 1: {Ri,R 3 } 

2. Rule Set 2: {Ri,R 2 ,R 4 } 

The advantage of Rule Set 1 is that the conditions for win and loss are easily 
defined and checked for an arbitrary board size (especially effective for N > 11). 
However, Rule Set 1 has a strong disadvantage in that some positions reasonably 
to be considered as a draw are treated as attacker’s wins. FigEJis one such case, 
where the “King and Gold” side is declared winner immediately by Rule Set 1. 
This is in agreement with the Nichureki Rule saying that “The King only means 
a loss even if just one move behind.” In modern Shogi such a position naturally 
is a drawn position, which feels intuitively correct. 

On the other hand, Rule Set 2 has some logical background that is similar to 
the rules of modern Shogi. A minor disadvantage is that it is highly tedious and 
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Fig. 5. An example of a position where the Gold can always be captured by the 
lone King, even if the attacker is to move. 



difficult to judge the game results when it is played on a big board, e.g., Dai- 
Shogi (a type of Heian Shogi that is played on a 13x13 board). In a game played 
on a 13x13 board, we have to remember all the positions of a KGK endgame in 
order to detect non-trivial draws. Otherwise, we need a repetition rule like the 
50-moves rule in western chess. 

Thus Rule Set 1 seems more primitive, but due to its simplicity more suitable 
for Heian Shogi which is being played on a large variety of board sizes. In con- 
trast, Rule Set 2 is more elaborated, so that it is suitable for the 9x9 board used 
in modern Shogi. Therefore the original rules of Heian Shogi might be similar 
to Rule Set 1, while Rule Set 2 is a later developed rule set being a transition 
(Pre-Shogi) between Heian Shogi and modern Shogi. 

4 Conclusions and Future Work 

In this paper we have gathered some evidence for evolutionary changes of Shogi 
using game-theoretic analyses by computer. Heian Shogi is an ancient game only 
briefly described in the few literature. Therefore, it is impossible to know the 
exact methods how it is played. Through game-theoretic analyses of proposed 
rules, we feel to have more insight into how these ancient games actually have 
been played. 

We demonstrated that 10x10 is the largest board size on which the KGK 
endgame of Heian Shogi is a deterministic win. Based on the analyses of KGK 
endgames, we further showed that Rule Set 1, {R-i, R 3 }, contains the rules most 
likely used in primitive Heian Shogi, whereas Rule Set 2, {Ri, R2, R4}, is a more 
elaborated set and probably a transition to modern Shogi. 

Future projects include: 

— To investigate the structural attributes determining why for small boards all 
KGK positions are wins (except for trivial draws), but that on larger boards 
non-trivial drawn KGK positions are possible. 
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— To study the relationship between this analysis of KGK endgames and the 
reuse rule of captured pieces in modern Shogi. 

Through these analyses, we hope to find the missing link between modern Shogi 
and the ancient Heian Shogi. 
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A The Description of Heian Shogi in Nichureki 

Nichureki was said to be edited by Tameyasu Miyoshi, a scholar of mathe- 
matics, in the late Kamakura Era (14th Century), combining Shochureki and 
Kaichureki that were supposed to be written in the Late Heian Era (12th Cen- 
tury). Nichureki describes briefly two kinds of Shogi: Heian Shogi and Heian Dai- 
Shogi. These two Shogi games are thought to be prototypes of modern Shogi. 
The following is a translation of an excerpt from Nichureki describing Heian 
Shogi and Heian Dai-Shogi: 

From this description, Tandai m proposed the following hypothesis: 
“Shogi games in the Heian Era used (were played on) the 8x8 board and 
Pawns located in the second rows. The reasons for the use of the 8x8 board is 
that, if the 9x9 board is used, the second player can never lose if he moves exactly 
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— the King can move in all directions; 

— the Gold cannot move to the two diagonal squares behind; 

— the Silver cannot move to the left, the right and straight behind; 

— the Knight can move to the squares directly ahead the front corners; 

— the Lance can move to any square straight ahead; 

— the Pawn can only move one square ahead; 

— all pieces are promoted to Gold by entering the three rows of the opponent 
side; 

— the attacker wins as soon as the opponent becomes the King alone. 



line-symmetric to the first player’s moves, i.e., by a mimic strategy. Next, from 
the description: ’’All pieces are promoted to Gold by entering the three rows 
of the opponent side,’ the third row can be considered as the boarder line or 
intersection; therefore the Pawns should line up in the second row.” (translated 
from Japanese by the authors) 

Nevertheless, it is impossible to specify the rules of Heian Shogi precisely 
from the description of Nichureki. 



B Examples of Longest-Positions-to-Mate 

For N = 9 and N = 10, examples of a longest position-to-mate (X m ) and the 
resulting checkmate position Xq are shown below. 

B.l N = 9 

When N = 9, the longest position-to-mate has 94 steps. One such example is 
shown in Figfj] The mating sequence from X m (FigjHJ) to Xq is shown below. 
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Fig. 6. An example of a longest position-to-mate on the 9x9 board (King- alone 
side to move). 
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Fig. 7. The position-in-mate derived from the position in Fig. 6 (King-alone side 
to move but unable, i.e., mate). 



B.2 N = 10 

When N = 10, the longest position-to-mate has 122 steps. Fig|5] is one such 
example. The mating sequence from X m (Figjsjl to A'o is shown below: 



13 


9il 


A 


9g£ 


0 


00 


A 


8f# 


0 


7 gl 


A 


7 e# 


0 


6fl 


A 


6d# 


0 


5 el 


A 


5 c# 


0 


4dl 


A 


4 b# 


Q 


3 cl 


A 


3 a# 


13 


2 cl 


A 


2 al 


0 


3 cl 


A 


3 b# 


0 


2 cl 


A 


3al 


0 


2dl 


A 


4bl 


0 


3dl 


A 


5 cl 


04 el 


A 


3 c# 


0 


5 el 


A 


4 c# 


0 


4 el 


A 


6dl 


Q 


5fl 


A 


4d# 


0 


6fl 


A 


4e# 


0 


5fl 


A 


5dl 


0 


5gl 


A 


5 el 


0 


6gl 


A 


4f# 


0 


5gl 


A 


5f# 


0 


6gl 


A 


4fl 


0 


7fl 


A 


5g£ 


0 


7 gl 


A 


5h# 


D 


6fl 


A 


4 el 


0 


7 gl 


A 


5fl 


0 


7 hi 


A 


5gl 


0 


7 gl 


A 


6h# 


0 


7fl 


A 


5fl 


0 


7 el 


A 


5 el 


0 


8fl 


A 


6fl 


0 


8 el 


A 


6 el 


0 


8dl 


A 


7g# 


Q 


9 el 


A 


7h# 


0 


8fl 


A 


8h# 


0 


8 el 


A 


8g^ 


0 


8dl 


A 


8f# 


0 


7 cl 


A 


7 e# 


Q 


6 cl 


A 


6d# 


Q 


5bl 


A 


5dl 


04 bl 



Retrograde Analysis of the KGK Endgame in Shogi 333 



10 987654321 























































































































































































bH 





















a 

b 



c 

d 



e 

f 

g 

h 

i 

j 



Fig. 8. An example of a longest position-to-mate on the 1(M0 board (King-alone 
side to move). 
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Bad 1 Bed 04 ad 16b# 03 ad 15 a# 02 ad l2cd 03 ad 

15b# 02 ad 14b# Q 1 ad 13 a# mate (Fig© 
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Fig. 9. The position-in-mate derived from the position in Fig. 8 (King-alone side 
to move but unable, i.e. , mate). 



C An Example of Non-trivial Drawn Positions 

Let us show, in Fig EH an example of a non-trivial drawn position described in 
Section 12 H 
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Fig. 10. An example of a non-trivial drawn position (King-and-Gold side to 
move). 



A ihti 0 2 el A3g# : f 1- A 3 el g i A 3h<5> 01 hi A3i# 

,0 1 il A 3 f I 0 1 j 1- A 3g 1 : 1 kl A 3 h I ; 2j 1- A 4 il 0 1 il 

A 3 j<5> Olhl A'lh 1 2h 1. A 3i<5> 2g i Algl 2f 1- A4fl 

2e 1- A 4 el Q2dI(Figjnj) 

We should note that at the position obtained from the position in Fig JTTil hv 
deleting the upper row and left column (FiglU}, the Gold-and-King side is able 
to win starting with the move A 2g^. Similarly, even by deleting the lower row 
and left column (see FigED, the Gold-and-King side is able to win starting with 
the move A3h#. 
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Fig. 11. Position obtained from the position in Fig. 10, by deleting the upper 
row and the left column (King-and-Gold side to move). 
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Fig. 12. Position obtained from the position in Fig. 10, by deleting the lower 
row and the left column (King-and-Gold side to move). 
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